feat: Support generic input for Parse effect #1305

Iltotore · 2025-06-30T12:14:23Z

Problem

Unlike other parsing libraries such as ZIO Parser, Kyo's Parse effect was only made for Text/Char input and does not support generic inputs. A common use-case are tokens to separate parsing and lexing.

Solution

Changed ParseState to consume Chunk[In] where In is the type of the tokens the parser consumes. The Parse is now also parametized with the input type.

Here is a working example (Scala CLI, put the files in the same directory) with a JSON parser: https://gist.github.com/Iltotore/8d4315e1c668645907f6d678c7c202b3

Notes

This PR includes breaking changes since Parse and some combinators now require a type parameter.
Had to do a "dirty trick" with readAspect to make it work with generic input. See its definition and localReadAspect's.
Despite some methods such as char being superseded by their generic counterpart (e.g literal[In](expected: In)), I currently kept them for backward compatibility and in case we divide ParseState into specialized TextParseState and another one for generic chunks.
Should we make a specialized ParseState for Text or keep the current representation of this PR (chunk of Char)?
Should In be contravariant in Parse?

DRAFT: Some methods and comments might need to be changed before marking this PR as ready. I'd still like to receive feedback on its current state though

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

kyo-data/shared/src/main/scala/kyo/Chunk.scala

ahoy-jon · 2025-07-01T07:10:01Z

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

-    def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) =
+    def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) =
        Choice.evalWith(seq)(identity)


same here, clause interleaving?

I think we should add it in all the functions (with A, In, S) if we plan to use it.

So something like

def anyOf[A](seq: (A < (Parse[In] & S))*)[In, S](using Frame): A < (Parse[In] & S)

?

def anyOf[A](using Frame)[In, S](seq: (A < (Parse[In] & S))*): A < (Parse[In] & S)

I guess I should do the same for every other combinators parametized with A, In, S such as firstOf, inOrder, skipUntil...?

If so, how should we handle implicit parameters depending on In like:

def firstOf[A, In, S]( parser1: => A < (Parse[In] & S), parser2: => A < (Parse[In] & S) )(using Frame, StateTag[In]): A < (Parse[In] & S)

like that?

def firstOf[A](using Frame)[In, S]( parser1: => A < (Parse[In] & S), parser2: => A < (Parse[In] & S) )(using StateTag[In]): A < (Parse[In] & S)

I interleaved the type parameters for other methods but firstOf and inOrder seem to have problem with clause interleaving:

def inOrder[A, B, C](using Frame )[In, S]( parser1: => A < (Parse[In] & S), parser2: => B < (Parse[In] & S), parser3: => C < (Parse[In] & S) )(using StateTag[In] ): (A, B, C) < (Parse[In] & S) //Same for other variants...

From ParseTest.scala line 655:

"consumes input sequentially" in run { val parser = Parse.inOrder( Parse.literal("a").andThen(1), Parse.literal("b").andThen(2), Parse.literal("c").andThen(3) ) Parse.run("abc")(parser).map { result => assert(result == (1, 2, 3)) } }

Error message

[error] -- [E134] Type Error: /home/fromentin/IdeaProjects/kyo/kyo-prelude/shared/src/test/scala/kyo/ParseTest.scala:655:35 [error] 655 | val parser = Parse.inOrder( [error] | ^^^^^^^^^^^^^ [error] |None of the overloaded alternatives of method inOrder in object Parse with types [error] | [A, B, C, D, E, F, G, H] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), [error] | parser3: => C < (kyo.Parse[In] & S), [error] | parser4: => D < (kyo.Parse[In] & S), [error] | parser5: => E < (kyo.Parse[In] & S), [error] | parser6: => F < (kyo.Parse[In] & S), [error] | parser7: => G < (kyo.Parse[In] & S), parser8: => H < (kyo.Parse[In] & S) [error] | ) [error] | (using x$10: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G, H) < ( [error] | kyo.Parse[In] & S) [error] | [A, B, C, D, E, F, G] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), [error] | parser3: => C < (kyo.Parse[In] & S), [error] | parser4: => D < (kyo.Parse[In] & S), [error] | parser5: => E < (kyo.Parse[In] & S), [error] | parser6: => F < (kyo.Parse[In] & S), parser7: => G < (kyo.Parse[In] & S) [error] | ) [error] | (using x$9: kyo.Parse.StateTag[In]): (A, B, C, D, E, F, G) < ( [error] | kyo.Parse[In] & S) [error] | [A, B, C, D, E, F] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), [error] | parser3: => C < (kyo.Parse[In] & S), [error] | parser4: => D < (kyo.Parse[In] & S), [error] | parser5: => E < (kyo.Parse[In] & S), parser6: => F < (kyo.Parse[In] & S) [error] | ) [error] | (using x$8: kyo.Parse.StateTag[In]): (A, B, C, D, E, F) < (kyo.Parse[In] [error] | & S) [error] | [A, B, C, D, E] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), [error] | parser3: => C < (kyo.Parse[In] & S), [error] | parser4: => D < (kyo.Parse[In] & S), parser5: => E < (kyo.Parse[In] & S) [error] | ) [error] | (using x$7: kyo.Parse.StateTag[In]): (A, B, C, D, E) < (kyo.Parse[In] & [error] | S) [error] | [A, B, C, D] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), [error] | parser3: => C < (kyo.Parse[In] & S), parser4: => D < (kyo.Parse[In] & S) [error] | )(using x$6: kyo.Parse.StateTag[In]): (A, B, C, D) < (kyo.Parse[In] & S) [error] | [A, B, C] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), [error] | parser2: => B < (kyo.Parse[In] & S), parser3: => C < (kyo.Parse[In] & S) [error] | )(using x$5: kyo.Parse.StateTag[In]): (A, B, C) < (kyo.Parse[In] & S) [error] | [A, B] [error] | (using x$1: kyo.Frame) [error] | [In, S] [error] | (parser1: => A < (kyo.Parse[In] & S), parser2: => B < (kyo.Parse[In] & S)) [error] | (using x$4: kyo.Parse.StateTag[In]): (A, B) < (kyo.Parse[In] & S) [error] |match arguments ((Int < kyo.Parse[Char], Int < kyo.Parse[Char], Int < kyo.Parse[Char])) [error] | [error] |where: A is a type variable [error] | B is a type variable [error] | C is a type variable [error] | D is a type variable [error] | E is a type variable [error] | F is a type variable [error] | G is a type variable [error] | H is a type variable

Passing the output types (A, B, C...) explicitly seems to solve the problem but still a worse type inferrence 🤷

For instance this compiles:

val parser = Parse.inOrder[Int, Int, Int]( Parse.literal("a").andThen(1), Parse.literal("b").andThen(2), Parse.literal("c").andThen(3) )

Note: I did not push the change for firstOf and inOrder due to this issue. Should we still get along with that and just pass explicitly the type parameters in the test cases?

ahoy-jon

not sure about Chunk.show, the rest is great!

hearnadam

Thank you for the contribution! Almost there.

kyo-data/shared/src/main/scala/kyo/Chunk.scala

kyo-data/shared/src/test/scala/kyo/TextTest.scala

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

hearnadam · 2025-07-01T07:23:27Z

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

-    def anyOf[A, S](seq: (A < (Parse & S))*)(using Frame): A < (Parse & S) =
+    def anyOf[A, In, S](seq: (A < (Parse[In] & S))*)(using Frame): A < (Parse[In] & S) =
        Choice.evalWith(seq)(identity)


I think we should add it in all the functions (with A, In, S) if we plan to use it.

Iltotore · 2025-07-01T07:51:10Z

For some reason I cannot directly reply to the following comment so I'll do it here:

I think it's fine if the state is parameterized for now.

Do you mean that I can remove the comment or also remove the char function since it's kind of superseded by literal?

ahoy-jon · 2025-07-01T09:12:21Z

@Iltotore

For some reason I cannot directly reply

me too, weird.

Do you mean that I can remove the comment or also remove the char function since it's kind of superseded by literal?

both? just keep literal?

ahoy-jon · 2025-07-01T10:21:04Z

a couple of changes, and it's good to go! thanks a lot @Iltotore!

Iltotore · 2025-07-01T12:31:30Z

< A3DB tbody class="d-block">

both? just keep literal?

That's the question actually 😅. Should we keep both or just literal?

ahoy-jon · 2025-07-02T11:18:53Z

both? just keep literal?

That's the question actually 😅. Should we keep both or just literal?

just literal

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

ahoy-jon · 2025-07-02T17:51:03Z

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

-        parser6: => F < (Parse & S),
-        parser7: => G < (Parse & S)
+    def inOrder[A, B, C, D, E, F, G, In, S](
+        parser1: => A < (Parse[In] & S),


@fwbrasil do you know why parser1: => A < (Parse[In] & S) is a by-name? (it's probably just me not getting something about laziness in the last mile)

Yeah that's to defer computation in the case of where the user passes an unsuspended effect

An example of what happens if we remove the by-name parameters:

import kyo.* enum Expr: case Lit(value: Int) case Add(left: Expr, right: Expr) val parseLit: Expr < Parse[Char] = Parse.int.map(Expr.Lit.apply) def parenthesized: Expr < Parse[Char] = Parse.between(Parse.char('('), parseExpr, Parse.char(')')) def parseInvokable: Expr < Parse[Char] = Parse.firstOf(parseLit, parenthesized) def parseAdd: Expr < Parse[Char] = Parse .separatedBy(parseInvokable, Parse.char('+')) .map(_.reduceLeft(Expr.Add.apply)) def parseExpr: Expr < Parse[Char] = parseAdd

Exception in thread "main" java.lang.StackOverflowError at java.base/java.lang.invoke.DirectMethodHandle.allocateInstance(DirectMethodHandle.java:520) at kyo.Parse$package$Parse$.literal(Parse.scala:878) at kyo.Parse$package$Parse$.char(Parse.scala:58) at ExprTest$package$.parenthesized(ExprTest.scala:28) at ExprTest$package$.parseInvokable(ExprTest.scala:11) at ExprTest$package$.parseAdd(ExprTest.scala:14) at ExprTest$package$.parseExpr(ExprTest.scala:17) at ExprTest$package$.parenthesized(ExprTest.scala:9)

yes, I added the by-name params to avoid stack overflows in recursive parsers

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

ahoy-jon

LGTM

fwbrasil · 2025-07-02T21:10:47Z

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

-    val readAspect: Aspect[Const[Text], [C] =>> Maybe[(Text, C)], Parse] =
-        Aspect.init(using Frame.internal)
+    def readAspect[A]: Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]] =
+        localReadAspect.asInstanceOf[Aspect[Const[Chunk[A]], [C] =>> Maybe[(Chunk[A], C)], Parse[A]]]


I think this isn't safe. If there are multiple parsers in a computation handling different input types, aspects will be applied to all regardless of their type, which can produce a class cast exception. We could make Aspect be based on tags to distinguish aspects for different input types. A workaround might be putting the aspect in the parse State itself but then we need to handle their removal at the end of the scope and composition of nested aspects. Can you open a separate to explore a way to encode this in Aspect? I'm afraid we'd need to fix this to merge.

fwbrasil · 2025-07-02T21:15:45Z

congrats on your first contribution @Iltotore! this is a good improvement and the json parser is a nice example 🎉

I think the only blocker is the Aspect issue. The source file is getting quite large. What do you think about extracting a new generic Read[A] effect and then keep Parse specific for Text? I wonder if we should move them to a separate module as well since we'll likely need to add more features and different kinds of parsers.

Iltotore · 2025-07-02T21:31:25Z

Yes the file is getting quite large. To be honest I was surprised when I saw that Parse is in prelude. I was thinking it was in its own module like kyo-parse.

However I'm not sure what benefit we get by keeping the text-only Parse.

Having discussed with @ahoy-jon , at the end of the day it would be good to have a better implementation of Parse. The current Var+Abort combination is similar to other parser combinators libraries but it is not suited to real use cases where you need to correctly handle error cases (emit all parsing errors and not just one, still recover a partial AST...) rather than just the happy path.

A good example is Chumsky in Rust.

fwbrasil · 2025-07-03T00:19:06Z

Yeah, I think we should move it to a new kyo-parse module. I suggested separating the generic version as a way to separate the "default" parsing methods specific to different input types so we don't have to deal with a single large file. We could have methods for binary inputs as well for example. Another alternative might be input-type-specific files with extension methods of the Parse companion but there's the downside of code completion showing methods for input types that aren't being used.

Having discussed with @ahoy-jon , at the end of the day it would be good to have a better implementation of Parse. The current Var+Abort combination is similar to other parser combinators libraries but it is not suited to real use cases where you need to correctly handle error cases (emit all parsing errors and not just one, still recover a partial AST...) rather than just the happy path.

Agreed! The way I structured the effect was also meant to validate the pattern using an opaque type that composes other effects. If I remember correctly, it was the first effect using this pattern. Ideally, we should have a proper ArrowEffect with richer operations like error accumulation and reporting. It should also have better performance.

ahoy-jon · 2025-07-03T05:50:03Z

Should we rename already the current Parse?

fwbrasil · 2025-07-04T01:28:35Z

I've just posted #1327 with a solution for parametrized generic aspects.

Should we rename already the current Parse?

Do you mean move to a new module? That'd be good but it isn't a requirement to merge.

### Problem `Aspect`s can't be used in generic contexts when their inputs or outputs depend on a parametrized type. This issue was raised as part of the work to generalize `Parse` in #1305. ### Solution Instead of relying on the object identity of the `Aspect` instance, manage `Cut` activations based on a pair of the type signature of the `Aspect` via `Tag` and its allocation site via `Frame`. ### Notes - This approach doesn't keep a strict identity as before since the `Frame` can be shared in case it's available a scope with multiple `Aspect` instantiations but that's something that should happen only in Kyo's codebase since users shouldn't manage `Frame`s. - I've removed the `default` cut to simplify the implementation since it's unused.

ahoy-jon · 2025-07-04T09:42:17Z

@Iltotore just merged #1327 if you can update to main!

Co-authored-by: Jonathan Winandy <jonathan.winandy@gmail.com>

kyo-prelude/shared/src/main/scala/kyo/Parse.scala

fwbrasil

LGTM after the Aspect fix. A nice to have would be testing multiple parsers for different input types in the same computation

Co-authored-by: Flavio Brasil <fwbrasil@users.noreply.github.com>

Iltotore · 2025-07-04T12:21:49Z

LGTM after the Aspect fix. A nice to have would be testing multiple parsers for different input types in the same computation

Do you have any specific example in mind?

fwbrasil · 2025-07-04T12:50:03Z

kyo-prelude/shared/src/main/scala/kyo/Parse.scala


 object Parse:

+    type StateTag[A] = Tag[Var[State[A]]]
+
+    private val localReadAspect: Aspect[Const[Chunk[Any]], [C] =>> Maybe[(Chunk[?], C)], Parse[Any]] =


fwbrasil · 2025-07-04T12:52:32Z

The goal would be checking the failure scenario we had in the implementation without the new tag-based Aspect. Maybe mixing Parse[Text] with Parse[Token] and using a method that relies on readAspect?

hearnadam reviewed Jun 30, 2025

View reviewed changes

ahoy-jon reviewed Jul 1, 2025

View reviewed changes

kyo-data/shared/src/main/scala/kyo/Chunk.scala Outdated Show resolved Hide resolved

ahoy-jon reviewed Jul 1, 2025

View reviewed changes

hearnadam reviewed Jul 1, 2025

View reviewed changes

Iltotore requested a review from ahoy-jon July 2, 2025 12:22

ahoy-jon reviewed Jul 2, 2025

View reviewed changes

kyo-prelude/shared/src/main/scala/kyo/Parse.scala Show resolved Hide resolved

ahoy-jon reviewed Jul 2, 2025

View reviewed changes

kyo-prelude/shared/src/main/scala/kyo/Parse.scala Outdated Show resolved Hide resolved

ahoy-jon reviewed Jul 2, 2025

View reviewed changes

ahoy-jon marked this pull request as ready for review July 2, 2025 17:53

fwbrasil reviewed Jul 2, 2025

View reviewed changes

fwbrasil mentioned this pull request Jul 4, 2025

[prelude] parametrized generic aspects #1327

Merged

Iltotore added 5 commits July 4, 2025 12:44

feat: Support generic input for Parse effect

7352480

fix: Set the asInstanceOf type explicitly in readAspect

0a7e2fd

feat: Add efficient Text#toChunk

5ced1d5

feat: Add regex(Regex) parser combinator

24cd264

feat: Add show (mkString but faster)

2d832be

Iltotore and others added 5 commits July 4, 2025 12:44

test: Add tests for toChunk with each Text op

ce3cbae

fix: Add clause interleaving to anyOf

9c1f2f2

feat: Use clause interleaving for other functions

2c71587

fix: Switch back to Chunk#mkString

4a573ca

Update kyo-prelude/shared/src/main/scala/kyo/Parse.scala

bb38fcd

Co-authored-by: Jonathan Winandy <jonathan.winandy@gmail.com>

Iltotore force-pushed the feat/parser-generic-input branch from f4ec4b2 to bb38fcd Compare July 4, 2025 10:44

fwbrasil reviewed Jul 4, 2025

View reviewed changes

kyo-prelude/shared/src/main/scala/kyo/Parse.scala Outdated Show resolved Hide resolved

fwbrasil approved these changes Jul 4, 2025

View reviewed changes

feat: Use generic aspect

352d50f

Co-authored-by: Flavio Brasil <fwbrasil@users.noreply.github.com>

fwbrasil reviewed Jul 4, 2025

View reviewed changes

feat: Support generic input for Parse effect #1305

Are you sure you want to change the base?

feat: Support generic input for Parse effect #1305

Uh oh!

Conversation

Uh oh!

Problem

Solution

Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!