lalrpop as a library in my project #924

xuanbachle · 2024-07-19T07:10:31Z

I want to use lalrpop as a library in my project. Particularly, I want to use the grammar, parseTree etc directly from lalrpop. How can I do that? So far, I have added lalrpop as build dependencies and dependencies, but none would allow me to use the grammar module etc from the lalrpop code.

If I take the lrgrammar.lalrpop and put that in my project, when generating parser for the lrgrammar.lalrpop, it has to use the grammar, parseTree etc from lalrpop and thus would not compile.

ANTLR allows to access the parseTree etc of ANTLR very easily, whereas LALRPOP makes it awfully difficult if I want to access the parseTree.

dburgener · 2024-07-19T12:23:19Z

Short answer: we do not support this today, sorry. That is by design, so that these internal components can change without impact to users.

Slightly longer answer: I am not wild about exposing all those internal details as public dependencies. I'm (very slowly) working on a PR review of #658, which adds a crate in the lalrpop workspace needing similar functionality. So far, the need for that functionality is the biggest concern I have with that PR. If we did want to support this use case there are a few different ways we could handle it:

Just mark the parse tree, grammar and whatever else as pub at the crate level
Split the relevant bits into a lalrpop_internal library, which the main lalrpop crate depends on, so that that library can expose these bits while lalrpop does not. If we did this, we would likely document a recommendation that users not link lalrpop_internal directly, but doing so anyways could support this use case.
Bring the functionality that needs access to lalrpop's internal bits directly into lalrpop, and expose a more limited public API to access it.

In my opinion, number 1, which is the easiest thing to implement is essentially a non-starter. I don't think we want to expose all of these bits are part of our public API. Number 2 is the most work, and adds to the long-term maintenance burden, but it is cleaner in terms of the API. Number 3 is nice, but only makes sense for certain use cases. And then of course the hidden number 4 option is the default, which is to simply not support this use case.

Can I ask why you need this functionality? I'm entertaining the idea primarily because the use case in #658 is very appealing and seems to have a very similar need, but in general I wouldn't expect this to be something end users need very much.

xuanbachle · 2024-07-20T11:32:55Z

Thank you for the very detailed response.

The use cases for this can be:

Synthesis of inputs belonging to an arbitrary grammar
Editor for LALRPOP grammars
and possibly many more.

My use case is automated synthesis of inputs belonging to an arbitrary grammar that can be used for: (1) program synthesis, and (2) fuzzing, in which inputs belonging to an arbitrary grammar can be generated to test a particular system, e.g., testing a parser generator like LALRPOP itself.

There could be many advantages if LALRPOP can make the parse tree and so on public to achieve the above tasks. This is very cleanly supported in ANTLR.

Pat-Lafon · 2024-07-20T20:39:06Z

Hmm, fuzzing of lalrpop itself is an interesting usecase(but probably falls under the pub(crate) resolution).

I'm surprised about using the parse tree for more general program synthesis. As a novice of the field, I've usually understood these tools to either work at the token level if taking an machine learning/LLM approach or alternatively at the AST level if doing more traditional methods(Since you are usually more concerned with the program semantics and only derive the syntax when successful). Some fuzzing techniques like AFL also I don't think need grammar access though AFL might not be appropriate here.

Pat-Lafon · 2024-07-20T20:43:52Z

Slightly longer answer: I am not wild about exposing all those internal details as public dependencies. I'm (very slowly) working on a PR review of #658, which adds a crate in the lalrpop workspace needing similar functionality. So far, the need for that functionality is the biggest concern I have with that PR. If we did want to support this use case there are a few different ways we could handle it:

1. Just mark the parse tree, grammar and whatever else as pub at the crate level

In my opinion, number 1, which is the easiest thing to implement is essentially a non-starter. I don't think we want to expose all of these bits are part of our public API.

I've looked a little at this pr and it seems both very interested but also a large increase in surface for more features/work.

Just to clarify my confusion. Marking these parts of LALRPOP as pub(crate) is a non-starter? I thought this would avoid it being part of the public api except for internal use?

dburgener · 2024-07-20T21:06:33Z

Hmm, fuzzing of lalrpop itself is an interesting usecase(but probably falls under the pub(crate) resolution).

By coincidence, I actually started running some local fuzzing on lalrpop yesterday. IMO having access to the internals would definitely help the fuzzing be more efficient, but fuzzing from the outside is less of an issue than you might have thought. I'm using cargo fuzz, which does coverage detection to find unfuzzed areas, and it's getting decent coverage in normalization. To actually fuzz the lr1 table generation step with any sort of efficiency would probably require access to internals though, so there's definitely some benefit there.

I've looked a little at this pr and it seems both very interested but also a large increase in surface for more features/work.

Yeah, I agree. I've slowly been working through it in bits of free time here and there trying to understand what all is going on. I figure I'll at least generate a list of issues to get cleaned up if we want to move forward and document the status of it. I agree with your take that I'm really not sure yet if it's something we want to sign up on the maintenance for.

Just to clarify my confusion. Marking these parts of LALRPOP as pub(crate) is a non-starter? I thought this would avoid it being part of the public api except for internal use?

Yeah, "pub at the crate level" is probably not the right phrase for what I mean. By "at the crate level" I was trying to say "in lib.rs so it gets exported as part of the public API", not pub(crate). I think (maybe I'm missing something) that pub(crate) would be insufficient for any of the use cases discussed here. @xuanbachle I assume it developing a separate crate outside our workspace, so pub(crate) wouldn't actually give the access needed there. And #658 adds a new crate in the lalrpop workspace, which needs to access private members of the lalrpop crate. Again, there pub(crate) would only expose them to all of the lalrpop crate.

Unless I'm missing something, I don't think there's a way to support either use case with some sort of pub marking of any sort without adding those things to our public API. And I don't think we want to sign up for treating modifications to the parse tree as a breaking change.

My use case is automated synthesis of inputs belonging to an arbitrary grammar that can be used for: (1) program synthesis, and (2) fuzzing, in which inputs belonging to an arbitrary grammar can be generated to test a particular system, e.g., testing a parser generator like LALRPOP itself.

@xuanbachle Do you have any public code you can point me at so I can dig in a little more to what you're doing? I assume it doesn't exist for lalrpop, because of the issues raised here, but maybe you have something for antlr for comparison? I'm still wanting to think through the details of our options for possibly supporting this sort of thing more.

xuanbachle · 2024-07-22T01:30:29Z

Hi guys, thanks very much for the detailed discussions, very interesting to see that LALRPOP is being actively worked on for many useful areas.

My particular use cases are both in program synthesis and fuzzing. For fuzzing, I have developed a grammar-based fuzzing using ANTLR: https://bitbucket.org/xbach/grammarfuzz/src/master/src/main/scala/ (repo of code may not contain complete code, but there are code using ANTLR for fuzzing an 8000 y grammars written in ANTLR in the repo). I can traverse an arbitrary grammar written in ANTLR easily using the parse tree facility provided by ANTLR. I wrote a research paper about this with my colleagues here: https://par.nsf.gov/servlets/purl/10195536.

Regarding program synthesis, I want to synthesize programs by syntax (my grammar is quite simple and programs following the syntax are sufficient to follow the defined semantics). I previously wrote a syntax-guided synthesis parser (in Scala), but now want to move to Rust. Scala is nice, but it has that sort Java thingy that I want to pass on ...

If the parse tree in LALRPOP is made public, I can imagine that one can develop editors for LALRPOP grammars too using LALRPOP itself.

darach · 2024-08-30T13:14:56Z

@xuanbachle When I wrote #658 I didn't have synthesis or fuzzing in mind as the project I needed the reference documentation generated for uses property based testing ( with quickcheck - we check the rust/lalrpop implementation against an erlang reference implementation ) for fuzzing. Generating an ANTLR
grammar would be super useful from LALRPOP grammars. The tooling and IDE integration with ANTLR is great for
language design. Looking forward to read your paper. Apologies for the delay, just spotted this now 2 years later!

This is so that we can support requests to access lalrpop internal functions and data structures without committing to bump semver on internal changes. Example use cases include: * IDE integration (lalrpop#1013) * benchmarking * fuzzing (lalrpop#924) * program synthesis (lalrpop#934) * Documentation generation (lalrpop#658) As noted in the linked issues and PRs above, there has been interest in this from multiple users with multiple use cases. The scheme here is that lalrpop-internals would expose whatever is needed for these use cases publically, and would bump semver liberally. The main lalrpop crate only holds the library and binary front-ends, and depends on lalrpop-internals for nearly all the work. The public API of lalrpop-internals right now includes what is needed for the following use cases: * lalrpop itself * build, log, session * The documentation generator in lalrpop#658 * log, grammar, parser, tok * The LSP demo at https://github.com/LighghtEeloo/lalrpop/tree/lsp * build, file_text, grammar, normalize, parser, session I haven't gone through the public members of these modules. It's likely that many of them should be marked pub(crate) rather than pub. That is something that should be done before this is merged.

dburgener mentioned this issue Aug 30, 2024

Documentation generator for LALRPOP ( markdown, railroad diagram svg, ebnf ) #658

Open

dburgener mentioned this issue Dec 17, 2024

LSP Integration Demo #1013

Open

dburgener mentioned this issue Feb 19, 2025

RFC: Move the bulk of lalrpop into a new lalrpop-internals crate #1045

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lalrpop as a library in my project #924

lalrpop as a library in my project #924

lalrpop as a library in my project #924

lalrpop as a library in my project #924

Comments