8000 lalrpop as a library in my project · Issue #924 · lalrpop/lalrpop · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

lalrpop as a library in my project #924

New issue
8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xuanbachle opened this issue Jul 19, 2024 · 7 comments
Open

lalrpop as a library in my project #924

xuanbachle opened this issue Jul 19, 2024 · 7 comments

Comments

@xuanbachle
Copy link

I want to use lalrpop as a library in my project. Particularly, I want to use the grammar, parseTree etc directly from lalrpop. How can I do that? So far, I have added lalrpop as build dependencies and dependencies, but none would allow me to use the grammar module etc from the lalrpop code.

If I take the lrgrammar.lalrpop and put that in my project, when generating parser for the lrgrammar.lalrpop, it has to use the grammar, parseTree etc from lalrpop and thus would not compile.

ANTLR allows to access the parseTree etc of ANTLR very easily, whereas LALRPOP makes it awfully difficult if I want to access the parseTree.

@dburgener
Copy link
Contributor

Short answer: we do not support this today, sorry. That is by design, so that these internal components can change without impact to users.

Slightly longer answer: I am not wild about exposing all those internal details as public dependencies. I'm (very slowly) working on a PR review of #658, which adds a crate in the lalrpop workspace needing similar functionality. So far, the need for that functionality is the biggest concern I have with that PR. If we did want to support this use case there are a few different ways we could handle it:

  1. Just mark the parse tree, grammar and whatever else as pub at the crate level
  2. Split the relevant bits into a lalrpop_internal library, which the main lalrpop crate depends on, so that that library can expose these bits while lalrpop does not. If we did this, we would likely document a recommendation that users not link lalrpop_internal directly, but doing so anyways could support this use case.
  3. Bring the functionality that needs access to lalrpop's internal bits directly into lalrpop, and expose a more limited public API to access it.

In my opinion, number 1, which is the easiest thing to implement is essentially a non-starter. I don't think we want to expose all of these bits are part of our public API. Number 2 is the most work, and adds to the long-term maintenance burden, but it is cleaner in terms of the API. Number 3 is nice, but only makes sense for certain use cases. And then of course the hidden number 4 option is the default, which is to simply not support this use case.

Can I ask why you need this functionality? I'm entertaining the idea primarily because the use case in #658 is very appealing and seems to have a very similar need, but in general I wouldn't expect this to be something end users need very much.

@xuanbachle
Copy link
Author

Thank you for the very detailed response.

The use cases for this can be:

  • Synthesis of inputs belonging to an arbitrary grammar
  • Editor for LALRPOP grammars
    and possibly many more.

My use case is automated synthesis of inputs belonging to an arbitrary grammar that can be used for: (1) program synthesis, and (2) fuzzing, in which inputs belonging to an arbitrary grammar can be generated to test a particular system, e.g., testing a parser generator like LALRPOP itself.

There could be many advantages if LALRPOP can make the parse tree and so on public to achieve the above tasks. This is very cleanly supported in ANTLR.

@Pat-Lafon
Copy link
Contributor

Hmm, fuzzing of lalrpop itself is an interesting usecase(but probably falls under the pub(crate) resolution).

I'm surprised about using the parse tree for more general program synthesis. As a novice of the field, I've usually understood these tools to either work at the token level if taking an machine learning/LLM approach or alternatively at the AST level if doing more traditional methods(Since you are usually more concerned with the program semantics and only derive the syntax when successful). Some fuzzing techniques like AFL also I don't think need grammar access though AFL might not be appropriate here.

@Pat-Lafon
Copy link
Contributor

Slightly longer answer: I am not wild about exposing all those internal details as public dependencies. I'm (very slowly) working on a PR review of #658, which adds a crate in the lalrpop workspace needing similar functionality. So far, the need for that functionality is the biggest concern I have with that PR. If we did want to support this use case there are a few different ways we could handle it:

1. Just mark the parse tree, grammar and whatever else as pub at the crate level

In my opinion, number 1, which is the easiest thing to implement is essentially a non-starter. I don't think we want to expose all of these bits are part of our public API.

I've looked a little at this pr and it seems both very interested but also a large increase in surface for more features/work.

Just to clarify my confusion. Marking these parts of LALRPOP as pub(crate) is a non-starter? I thought this would avoid it being part of the public api except for internal use?

@dburgener
Copy link
Contributor

Hmm, fuzzing of lalrpop itself is an interesting usecase(but probably falls under the pub(crate) resolution).

By coincidence, I actually started running some local fuzzing on lalrpop yesterday. IMO having access to the internals would definitely help the fuzzing be more efficient, but fuzzing from the outside is less of an issue than you might have thought. I'm using cargo fuzz, which does coverage detection to find unfuzzed areas, and it's getting decent coverage in normalization. To actually fuzz the lr1 table generation step with any sort of efficiency would probably require access to internals though, so there's definitely some benefit there.

I've looked a little at this pr and it seems both very interested but also a large increase in surface for more features/work.

Yeah, I agree. I've slowly been working through it in bits of free time here and there trying to understand what all is going on. I figure I'll at least generate a list of issues to get cleaned up if we want to move forward and document the status of it. I agree with your take that I'm really not sure yet if it's something we want to sign up on the maintenance for.

Just to clarify my confusion. Marking these parts of LALRPOP as pub(crate) is a non-starter? I thought this would avoid it being part of the public api except for internal use?

Yeah, "pub at the crate level" is probably not the right phrase for what I mean. By "at the crate level" I was trying to say "in lib.rs so it gets exported as part of the public API", not pub(crate). I think (maybe I'm missing something) that pub(crate) would be insufficient for any of the use cases discussed here. @xuanbachle I assume it developing a separate crate outside our workspace, so pub(crate) wouldn't actually give the access needed there. And #658 adds a new crate in the lalrpop workspace, which needs to access private members of the lalrpop crate. Again, there pub(crate) would only expose them to all of the lalrpop crate.

Unless I'm missing something, I don't think there's a way to support either use case with some sort of pub marking of any sort without adding those things to our public API. And I don't think we want to sign up for treating modifications to the parse tree as a breaking change.

My use case is automated synthesis of inputs belonging to an arbitrary grammar that can be used for: (1) program synthesis, and (2) fuzzing, in which inputs belonging to an arbitrary grammar can be generated to test a particular system, e.g., testing a parser generator like LALRPOP itself.

@xuanbachle Do you have any public code you can point me at so I can dig in a little more to what you're doing? I assume it doesn't exist for lalrpop, because of the issues raised here, but maybe you have something for antlr for comparison? I'm still wanting to think through the details of our options for possibly supporting this sort of thing more.

@xuanbachle
Copy link
Author

Hi guys, thanks very much for the detailed discussions, very interesting to see that LALRPOP is being actively worked on for many useful areas.

My particular use cases are both in program synthesis and fuzzing. For fuzzing, I have developed a grammar-based fuzzing using ANTLR: https://bitbucket.org/xbach/grammarfuzz/src/master/src/main/scala/ (repo of code may not contain complete code, but there are code using ANTLR for fuzzing an 8000 y grammars written in ANTLR in the repo). I can traverse an arbitrary grammar written in ANTLR easily using the parse tree facility provided by ANTLR. I wrote a research paper about this with my colleagues here: https://par.nsf.gov/servlets/purl/10195536.

Regarding program synthesis, I want to synthesize programs by syntax (my grammar is quite simple and programs following the syntax are sufficient to follow the defined semantics). I previously wrote a syntax-guided synthesis parser (in Scala), but now want to move to Rust. Scala is nice, but it has that sort Java thingy that I want to pass on ...

If the parse tree in LALRPOP is made public, I can imagine that one can develop editors for LALRPOP grammars too using LALRPOP itself.

@darach
Copy link
darach commented Aug 30, 2024

@xuanbachle When I wrote #658 I didn't have synthesis or fuzzing in mind as the project I needed the reference documentation generated for uses property based testing ( with quickcheck - we check the rust/lalrpop implementation against an erlang reference implementation ) for fuzzing. Generating an ANTLR
grammar would be super useful from LALRPOP grammars. The tooling and IDE integration with ANTLR is great for
language design. Looking forward to read your paper. Apologies for the delay, just spotted this now 2 years later!

dburgener added a commit to dburgener/lalrpop that referenced this issue Feb 19, 2025
This is so that we can support requests to access lalrpop internal
functions and data structures without committing to bump semver on
internal changes.

Example use cases include:
* IDE integration (lalrpop#1013)
* benchmarking
* fuzzing (lalrpop#924)
* program synthesis (lalrpop#934)
* Documentation generation (lalrpop#658)

As noted in the linked issues and PRs above, there has been interest in
this from multiple users with multiple use cases.

The scheme here is that lalrpop-internals would expose whatever is
needed for these use cases publically, and would bump semver liberally.
The main lalrpop crate only holds the library and binary front-ends, and
depends on lalrpop-internals for nearly all the work.

The public API of lalrpop-internals right now includes what is needed
for the following use cases:

* lalrpop itself
    * build, log, session
* The documentation generator in lalrpop#658
    * log, grammar, parser, tok
* The LSP demo at https://github.com/LighghtEeloo/lalrpop/tree/lsp
    * build, file_text, grammar, normalize, parser, session

I haven't gone through the public members of these modules.  It's likely
that many of them should be marked pub(crate) rather than pub.  That is
something that should be done before this is merged.
dburgener added a commit to dburgener/lalrpop that referenced this issue Feb 28, 2025
This is so that we can support requests to access lalrpop internal
functions and data structures without committing to bump semver on
internal changes.

Example use cases include:
* IDE integration (lalrpop#1013)
* benchmarking
* fuzzing (lalrpop#924)
* program synthesis (lalrpop#934)
* Documentation generation (lalrpop#658)

As noted in the linked issues and PRs above, there has been interest in
this from multiple users with multiple use cases.

The scheme here is that lalrpop-internals would expose whatever is
needed for these use cases publically, and would bump semver liberally.
The main lalrpop crate only holds the library and binary front-ends, and
depends on lalrpop-internals for nearly all the work.

The public API of lalrpop-internals right now includes what is needed
for the following use cases:

* lalrpop itself
    * build, log, session
* The documentation generator in lalrpop#658
    * log, grammar, parser, tok
* The LSP demo at https://github.com/LighghtEeloo/lalrpop/tree/lsp
    * build, file_text, grammar, normalize, parser, session

I haven't gone through the public members of these modules.  It's likely
that many of them should be marked pub(crate) rather than pub.  That is
something that should be done before this is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0