8000 feat: Support generic input for Parse effect by Iltotore · Pull Request #1305 · getkyo/kyo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: Support generic input for Parse effect #1305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Iltotore
Copy link
@Iltotore Iltotore commented Jun 30, 2025

Problem

Unlike other parsing libraries such as ZIO Parser, Kyo's Parse effect was only made for Text/Char input and does not support generic inputs. A common use-case are tokens to separate parsing and lexing.

Solution

Changed ParseState to consume Chunk[In] where In is the type of the tokens the parser consumes. The Parse is now also parametized with the input type.

Here is a working example (Scala CLI, put the files in the same directory) with a JSON parser: https://gist.github.com/Iltotore/8d4315e1c668645907f6d678c7c202b3

Notes

  • This PR includes breaking changes since Parse and some combinators now require a type parameter.
  • Had to do a "dirty trick" with readAspect to make it work with generic input. See its definition and localReadAspect's.
  • Despite some methods such as char being superseded by their generic counterpart (e.g literal[In](expected: In)), I currently kept them for backward compatibility and in case we divide ParseState into specialized TextParseState and another one for generic chunks.
  • Should we make a specialized ParseState for Text or keep the current representation of this PR (chunk of Char)?
  • Should In be contravariant in Parse?

DRAFT: Some methods and comments might need to be changed before marking this PR as ready. I'd still like to receive feedback on its current state though

Comment on lines 141 to 149
def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) =
def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) =
Choice.evalWith(seq)(identity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, clause interleaving?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add it in all the functions (with A, In, S) if we plan to use it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like

def anyOf[A](seq: (A < (Parse[In] & S))*)[In, S](using Frame): A < (Parse[In] & S)

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def anyOf[A](using Frame)[In, S](seq: (A < (Parse[In] & S))*): A < (Parse[In] & S)

Copy link
Author
@Iltotore Iltotore Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I should do the same for every other combinators parametized with A, In, S such as firstOf, inOrder, skipUntil...?

If so, how should we handle implicit parameters depending on In like:

def firstOf[A, In, S](
        parser1: => A < (Parse[In] & S),
        parser2: => A < (Parse[In] & S)
    )(using Frame, StateTag[In]): A < (Parse[In] & S)

Copy link
Collaborator
@ahoy-jon ahoy-jon Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that?

def firstOf[A](using Frame)[In, S](
        parser1: => A < (Parse[In] & S),
        parser2: => A < (Parse[In] & S)
    )(using StateTag[In]): A < (Parse[In] & S)

Copy link
Author
@Iltotore Iltotore Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I interleaved the type parameters for other methods but firstOf and inOrder seem to have problem with clause interleaving:

def inOrder[A, B, C](using
        Frame
)[In, S](
    parser1: => A < (Parse[In] & S),
    parser2: => B < (Parse[In] & S),
    parser3: => C < (Parse[In] & S)
)(using
    StateTag[In]
): (A, B, C) < (Parse[In] & S)

//Same for other variants...

From ParseTest.scala line 655:

"consumes input sequentially" in run {
    val parser = Parse.inOrder(
        Parse.literal("a").andThen(1),
        Parse.literal("b").andThen(2),
        Parse.literal("c").andThen(3)
    )
    Parse.run("abc")(parser).map { result =>
        assert(result == (1, 2, 3))
    }
}
Error message
[error] -- [E134] Type Error: /home/fromentin/IdeaProjects/kyo/kyo-prelude/shared/src/test/scala/kyo/ParseTest.scala:655:35 
[error] 655 |                val parser = Parse.inOrder(
[error]     |                             ^^^^^^^^^^^^^
[error]     |None of the overloaded alternatives of method inOrder in object Parse with types
[error]     | [A, B, C, D, E, F, G, H]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S),
[error]     |        parser3: => C < (kyo.Parse[In] & S),
[error]     |        parser4: => D < (kyo.Parse[In] & S),
[error]     |        parser5: => E < (kyo.Parse[In] & S),
[error]     |        parser6: => F < (kyo.Parse[In] & S),
[error]     |        parser7: => G < (kyo.Parse[In] & S), parser8: => H < (kyo.Parse[In] & S)
[error]     |        )
[error]     |        (using x$10: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G, H) < (
[error]     |          kyo.Parse[In] & S)
[error]     | [A, B, C, D, E, F, G]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S),
[error]     |        parser3: => C < (kyo.Parse[In] & S),
[error]     |        parser4: => D < (kyo.Parse[In] & S),
[error]     |        parser5: => E < (kyo.Parse[In] & S),
[error]     |        parser6: => F < (kyo.Parse[In] & S), parser7: => G < (kyo.Parse[In] & S)
[error]     |        )
[error]     |        (using x$9: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G) < (
[error]     |          kyo.Parse[In] & S)
[error]     | [A, B, C, D, E, F]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S),
[error]     |        parser3: => C < (kyo.Parse[In] & S),
[error]     |        parser4: => D < (kyo.Parse[In] & S),
[error]     |        parser5: => E < (kyo.Parse[In] & S), parser6: => F < (kyo.Parse[In] & S)
[error]     |        )
[error]     |        (using x$8: kyo.Parse.StateTag[In]): (A, B, C, D, E, F) < (kyo.Parse[In]
[error]     |           & S)
[error]     | [A, B, C, D, E]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S),
[error]     |        parser3: => C < (kyo.Parse[In] & S),
[error]     |        parser4: => D < (kyo.Parse[In] & S), parser5: => E < (kyo.Parse[In] & S)
[error]     |        )
[error]     |        (using x$7: kyo.Parse.StateTag[In]): (A, B, C, D, E) < (kyo.Parse[In] &
[error]     |          S)
[error]     | [A, B, C, D]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S),
[error]     |        parser3: => C < (kyo.Parse[In] & S), parser4: => D < (kyo.Parse[In] & S)
[error]     |        )(using x$6: kyo.Parse.StateTag[In]): (A, B, C, D) < (kyo.Parse[In] & S)
[error]     | [A, B, C]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S),
[error]     |        parser2: => B < (kyo.Parse[In] & S), parser3: => C < (kyo.Parse[In] & S)
[error]     |        )(using x$5: kyo.Parse.StateTag[In]): (A, B, C) < (kyo.Parse[In] & S)
[error]     | [A, B]
[error]     |  (using x$1: kyo.Frame)
[error]     |    [In, S]
[error]     |      (parser1: => A < (kyo.Parse[In] & S), parser2: => B < (kyo.Parse[In] & S))
[error]     |        (using x$4: kyo.Parse.StateTag[In]): (A, B) < (kyo.Parse[In] & S)
[error]     |match arguments ((Int < kyo.Parse[Char], Int < kyo.Parse[Char], Int < kyo.Parse[Char]))
[error]     |
[error]     |where:    A is a type variable
[error]     |          B is a type variable
[error]     |          C is a type variable
[error]     |          D is a type variable
[error]     |          E is a type variable
[error]     |          F is a type variable
[error]     |          G is a type variable
[error]     |          H is a type variable

Passing the output types (A, B, C...) explicitly seems to solve the problem but still a worse type inferrence 🤷

For instance this compiles:

val parser = Parse.inOrder[Int, Int, Int](
    Parse.literal("a").andThen(1),
    Parse.literal("b").andThen(2),
    Parse.literal("c").andThen(3)
)

Note: I did not push the change for firstOf and inOrder due to this issue. Should we still get along with that and just pass explicitly the type parameters in the test cases?

Copy link
Collaborator
@ahoy-jon ahoy-jon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about Chunk.show, the rest is great!

Copy link
Collaborator
@hearnadam hearnadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Almost there.

Comment on lines 141 to 149
def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) =
def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) =
Choice.evalWith(seq)(identity)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add it in all the functions (with A, In, S) if we plan to use it.

@Iltotore
Copy link
Author
Iltotore commented Jul 1, 2025

For some reason I cannot directly reply to the following comment so I'll do it here:

I think it's fine if the state is parameterized for now.

Do you mean that I can remove the comment or also remove the char function since it's kind of superseded by literal?

@ahoy-jon
Copy link
Collaborator
ahoy-jon commented Jul 1, 2025

@Iltotore

For some reason I cannot directly reply

me too, weird.

Do you mean that I can remove the comment or also remove the char function since it's kind of superseded by literal?

both? just keep literal?

@ahoy-jon
Copy link
Collaborator
ahoy-jon commented Jul 1, 2025

a couple of changes, and it's good to go! thanks a lot @Iltotore!

@Iltotore
Copy link
Author
Iltotore commented Jul 1, 2025
< A3DB tbody class="d-block">

both? just keep literal?

That's the question actually 😅. Should we keep both or just literal?

@ahoy-jon
Copy link
Collaborator
ahoy-jon commented Jul 2, 2025

both? just keep literal?

That's the question actually 😅. Should we keep both or just literal?

just literal

@Iltotore Iltotore requested a review from ahoy-jon July 2, 2025 12:22
parser6: => F < (Parse & S),
parser7: => G < (Parse & S)
def inOrder[A, B, C, D, E, F, G, In, S](
parser1: => A < (Parse[In] & S),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwbrasil do you know why parser1: => A < (Parse[In] & S) is a by-name? (it's probably just me not getting something about laziness in the last mile)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's to defer computation in the case of where the user passes an unsuspended effect

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of what happens if we remove the by-name parameters:

import kyo.*

enum Expr:
  case Lit(value: Int)
  case Add(left: Expr, right: Expr)

val parseLit: Expr < Parse[Char] = Parse.int.map(Expr.Lit.apply)

def parenthesized: Expr < Parse[Char] = Parse.between(Parse.char('('), parseExpr, Parse.char(')'))

def parseInvokable: Expr < Parse[Char] = Parse.firstOf(parseLit, parenthesized)

def parseAdd: Expr < Parse[Char] = Parse
  .separatedBy(parseInvokable, Parse.char('+'))
  .map(_.reduceLeft(Expr.Add.apply))

def parseExpr: Expr < Parse[Char] = parseAdd
Exception in thread "main" java.lang.StackOverflowError
	at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:520)
	at kyo.Parse$package$Parse$.literal(Parse.scala:878)
	at kyo.Parse$package$Parse$.char(Parse.scala:58)
	at ExprTest$package$.parenthesized(ExprTest.scala:28)
	at ExprTest$package$.parseInvokable(ExprTest.scala:11)
	at ExprTest$package$.parseAdd(ExprTest.scala:14)
	at ExprTest$package$.parseExpr(ExprTest.scala:17)
	at ExprTest$package$.parenthesized(ExprTest.scala:9)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I added the by-name params to avoid stack overflows in recursive parsers

Copy link
Collaborator
@ahoy-jon ahoy-jon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ahoy-jon ahoy-jon marked this pull request as ready for review July 2, 2025 17:53
val readAspect: Aspect[Const[Text], [C] =>> Maybe[(Text, C)], Parse] =
Aspect.init(using Frame.internal)
def readAspect[A]: Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]] =
localReadAspect.asInstanceOf[Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't safe. If there are multiple parsers in a computation handling different input types, aspects will be applied to all regardless of their type, which can produce a class cast exception. We could make Aspect be based on tags to distinguish aspects for different input types. A workaround might be putting the aspect in the parse State itself but then we need to handle their removal at the end of the scope and composition of nested aspects. Can you open a separate to explore a way to encode this in Aspect? I'm afraid we'd need to fix this to merge.

@fwbrasil
Copy link
Collaborator
fwbrasil commented Jul 2, 2025

congrats on your first contribution @Iltotore! this is a good improvement and the json parser is a nice example 🎉

I think the only blocker is the Aspect issue. The source file is getting quite large. What do you think about extracting a new generic Read[A] effect and then keep Parse specific for Text? I wonder if we should move them to a separate module as well since we'll likely need to add more features and different kinds of parsers.

@Iltotore
Copy link
Author
Iltotore commented Jul 2, 2025

Yes the file is getting quite large. To be honest I was surprised when I saw that Parse is in prelude. I was thinking it was in its own module like kyo-parse.

However I'm not sure what benefit we get by keeping the text-only Parse.

Having discussed with @ahoy-jon , at the end of the day it would be good to have a better implementation of Parse. The current Var+Abort combination is similar to other parser combinators libraries but it is not suited to real use cases where you need to correctly handle error cases (emit all parsing errors and not just one, still recover a partial AST...) rather than just the happy path.

A good example is Chumsky in Rust.

@fwbrasil
Copy link
Collaborator
fwbrasil commented Jul 3, 2025

Yeah, I think we should move it to a new kyo-parse module. I suggested separating the generic version as a way to separate the "default" parsing methods specific to different input types so we don't have to deal with a single large file. We could have methods for binary inputs as well for example. Another alternative might be input-type-specific files with extension methods of the Parse companion but there's the downside of code completion showing methods for input types that aren't being used.

Having discussed with @ahoy-jon , at the end of the day it would be good to have a better implementation of Parse. The current Var+Abort combination is similar to other parser combinators libraries but it is not suited to real use cases where you need to correctly handle error cases (emit all parsing errors and not just one, still recover a partial AST...) rather than just the happy path.

Agreed! The way I structured the effect was also meant to validate the pattern using an opaque type that composes other effects. If I remember correctly, it was the first effect using this pattern. Ideally, we should have a proper ArrowEffect with richer operations like error accumulation and reporting. It should also have better performance.

@ahoy-jon
Copy link
Collaborator
ahoy-jon commented Jul 3, 2025

Should we rename already the current Parse?

@fwbrasil
Copy link
Collaborator
fwbrasil commented Jul 4, 2025

I've just posted #1327 with a solution for parametrized generic aspects.

Should we rename already the current Parse?

Do you mean move to a new module? That'd be good but it isn't a requirement to merge.

ahoy-jon pushed a commit that referenced this pull request Jul 4, 2025
### Problem

`Aspect`s can't be used in generic contexts when their inputs or outputs
depend on a parametrized type. This issue was raised as part of the work
to generalize `Parse` in #1305.

### Solution

Instead of relying on the object identity of the `Aspect` instance,
manage `Cut` activations based on a pair of the type signature of the
`Aspect` via `Tag` and its allocation site via `Frame`.

### Notes

- This approach doesn't keep a strict identity as before since the
`Frame` can be shared in case it's available a scope with multiple
`Aspect` instantiations but that's something that should happen only in
Kyo's codebase since users shouldn't manage `Frame`s.
- I've removed the `default` cut to simplify the implementation since
it's unused.
@ahoy-jon
Copy link
Collaborator
ahoy-jon commented Jul 4, 2025

@Iltotore just merged #1327 if you can update to main!

@Iltotore Iltotore force-pushed the feat/parser-generic-input branch from f4ec4b2 to bb38fcd Compare July 4, 2025 10:44
Copy link
Collaborator
@fwbrasil fwbrasil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the Aspect fix. A nice to have would be testing multiple parsers for different input types in the same computation

Co-authored-by: Flavio Brasil <fwbrasil@users.noreply.github.com>
@Iltotore
Copy link
Author
Iltotore commented Jul 4, 2025

LGTM after the Aspect fix. A nice to have would be testing multiple parsers for different input types in the same computation

Do you have any specific example in mind?


object Parse:

type StateTag[A] = Tag[Var[State[A]]]

private val localReadAspect: Aspect[Const[Chunk[Any]], [C] =>> Maybe[(Chunk[?], C)], Parse[Any]] =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@fwbrasil
Copy link
Collaborator
fwbrasil commented Jul 4, 2025

The goal would be checking the failure scenario we had in the implementation without the new tag-based Aspect. Maybe mixing Parse[Text] with Parse[Token] and using a method that relies on readAspect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0