-
Notifications
You must be signed in to change notification settings - Fork 298
lalrpop as a library in my project #924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Short answer: we do not support this today, sorry. That is by design, so that these internal components can change without impact to users. Slightly longer answer: I am not wild about exposing all those internal details as public dependencies. I'm (very slowly) working on a PR review of #658, which adds a crate in the lalrpop workspace needing similar functionality. So far, the need for that functionality is the biggest concern I have with that PR. If we did want to support this use case there are a few different ways we could handle it:
In my opinion, number 1, which is the easiest thing to implement is essentially a non-starter. I don't think we want to expose all of these bits are part of our public API. Number 2 is the most work, and adds to the long-term maintenance burden, but it is cleaner in terms of the API. Number 3 is nice, but only makes sense for certain use cases. And then of course the hidden number 4 option is the default, which is to simply not support this use case. Can I ask why you need this functionality? I'm entertaining the idea primarily because the use case in #658 is very appealing and seems to have a very similar need, but in general I wouldn't expect this to be something end users need very much. |
Thank you for the very detailed response. The use cases for this can be:
My use case is automated synthesis of inputs belonging to an arbitrary grammar that can be used for: (1) program synthesis, and (2) fuzzing, in which inputs belonging to an arbitrary grammar can be generated to test a particular system, e.g., testing a parser generator like LALRPOP itself. There could be many advantages if LALRPOP can make the parse tree and so on public to achieve the above tasks. This is very cleanly supported in ANTLR. |
Hmm, fuzzing of lalrpop itself is an interesting usecase(but probably falls under the I'm surprised about using the parse tree for more general program synthesis. As a novice of the field, I've usually understood these tools to either work at the token level if taking an machine learning/LLM approach or alternatively at the AST level if doing more traditional methods(Since you are usually more concerned with the program semantics and only derive the syntax when successful). Some fuzzing techniques like AFL also I don't think need grammar access though AFL might not be appropriate here. |
I've looked a little at this pr and it seems both very interested but also a large increase in surface for more features/work. Just to clarify my confusion. Marking these parts of LALRPOP as |
By coincidence, I actually started running some local fuzzing on lalrpop yesterday. IMO having access to the internals would definitely help the fuzzing be more efficient, but fuzzing from the outside is less of an issue than you might have thought. I'm using
Yeah, I agree. I've slowly been working through it in bits of free time here and there trying to understand what all is going on. I figure I'll at least generate a list of issues to get cleaned up if we want to move forward and document the status of it. I agree with your take that I'm really not sure yet if it's something we want to sign up on the maintenance for.
Yeah, "pub at the crate level" is probably not the right phrase for what I mean. By "at the crate level" I was trying to say "in lib.rs so it gets exported as part of the public API", not Unless I'm missing something, I don't think there's a way to support either use case with some sort of
@xuanbachle Do you have any public code you can point me at so I can dig in a little more to what you're doing? I assume it doesn't exist for lalrpop, because of the issues raised here, but maybe you have something for antlr for comparison? I'm still wanting to think through the details of our options for possibly supporting this sort of thing more. |
Hi guys, thanks very much for the detailed discussions, very interesting to see that LALRPOP is being actively worked on for many useful areas. My particular use cases are both in program synthesis and fuzzing. For fuzzing, I have developed a grammar-based fuzzing using ANTLR: https://bitbucket.org/xbach/grammarfuzz/src/master/src/main/scala/ (repo of code may not contain complete code, but there are code using ANTLR for fuzzing an 8000 y grammars written in ANTLR in the repo). I can traverse an arbitrary grammar written in ANTLR easily using the parse tree facility provided by ANTLR. I wrote a research paper about this with my colleagues here: https://par.nsf.gov/servlets/purl/10195536. Regarding program synthesis, I want to synthesize programs by syntax (my grammar is quite simple and programs following the syntax are sufficient to follow the defined semantics). I previously wrote a syntax-guided synthesis parser (in Scala), but now want to move to Rust. Scala is nice, but it has that sort Java thingy that I want to pass on ... If the parse tree in LALRPOP is made public, I can imagine that one can develop editors for LALRPOP grammars too using LALRPOP itself. |
@xuanbachle When I wrote #658 I didn't have synthesis or fuzzing in mind as the project I needed the reference documentation generated for uses property based testing ( with quickcheck - we check the rust/lalrpop implementation against an erlang reference implementation ) for fuzzing. Generating an ANTLR |
This is so that we can support requests to access lalrpop internal functions and data structures without committing to bump semver on internal changes. Example use cases include: * IDE integration (lalrpop#1013) * benchmarking * fuzzing (lalrpop#924) * program synthesis (lalrpop#934) * Documentation generation (lalrpop#658) As noted in the linked issues and PRs above, there has been interest in this from multiple users with multiple use cases. The scheme here is that lalrpop-internals would expose whatever is needed for these use cases publically, and would bump semver liberally. The main lalrpop crate only holds the library and binary front-ends, and depends on lalrpop-internals for nearly all the work. The public API of lalrpop-internals right now includes what is needed for the following use cases: * lalrpop itself * build, log, session * The documentation generator in lalrpop#658 * log, grammar, parser, tok * The LSP demo at https://github.com/LighghtEeloo/lalrpop/tree/lsp * build, file_text, grammar, normalize, parser, session I haven't gone through the public members of these modules. It's likely that many of them should be marked pub(crate) rather than pub. That is something that should be done before this is merged.
This is so that we can support requests to access lalrpop internal functions and data structures without committing to bump semver on internal changes. Example use cases include: * IDE integration (lalrpop#1013) * benchmarking * fuzzing (lalrpop#924) * program synthesis (lalrpop#934) * Documentation generation (lalrpop#658) As noted in the linked issues and PRs above, there has been interest in this from multiple users with multiple use cases. The scheme here is that lalrpop-internals would expose whatever is needed for these use cases publically, and would bump semver liberally. The main lalrpop crate only holds the library and binary front-ends, and depends on lalrpop-internals for nearly all the work. The public API of lalrpop-internals right now includes what is needed for the following use cases: * lalrpop itself * build, log, session * The documentation generator in lalrpop#658 * log, grammar, parser, tok * The LSP demo at https://github.com/LighghtEeloo/lalrpop/tree/lsp * build, file_text, grammar, normalize, parser, session I haven't gone through the public members of these modules. It's likely that many of them should be marked pub(crate) rather than pub. That is something that should be done before this is merged.
I want to use lalrpop as a library in my project. Particularly, I want to use the grammar, parseTree etc directly from lalrpop. How can I do that? So far, I have added lalrpop as build dependencies and dependencies, but none would allow me to use the grammar module etc from the lalrpop code.
If I take the lrgrammar.lalrpop and put that in my project, when generating parser for the lrgrammar.lalrpop, it has to use the grammar, parseTree etc from lalrpop and thus would not compile.
ANTLR allows to access the parseTree etc of ANTLR very easily, whereas LALRPOP makes it awfully difficult if I want to access the parseTree.
The text was updated successfully, but these errors were encountered: