-
Notifications
You must be signed in to change notification settings - Fork 77
Yosys optimizer pass error #1712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This one fails too. This is taken from
|
The pass is meant to be run after a variety of pre-processing is applied first, corresponding to the However, when I try that on these IRs I get errors. Looks like the early one-shot-bufferize pass doesn't like
But ignoring that I also see a familiar I think in the next week or two @asraa plans to add CGGI to the python frontend, so in that case Asra could you make sure to kick the tires on these two IRs to iron out these kinks? |
If that helps, the way I solve |
Yes but |
Hey! @ludns I'll be working on integrating the whole mlir-to-cggi pipeline into the frontend. The issue with the one that you posted is like Jeremy said, but the transforms need to happen before the wrap-generic is called. |
Hi @asraa. Is there a way to break down the the mlir-to-cggi pipeline into multiple passes? It's easier for me to get a sense of what's going on. As an example, here is how I sequence passes so far (for CGGI): When you say the transforms need to happen before wrap-generic happens, which transform are you referring to? |
I am having a really hard time getting pretty basic Matrix multiplication
Variance calculation
|
(keeping track of what I encounter in case it's useful): I am also unable to get the Yosys pass working with multiple outputs:
Fails with
|
Oh! You may want to use the
OK for @loopa - you will need to start from the IR with just the secret.secret annotations.
Running For the larger matmul example you posted, the default compilation path will take quite a while since it will (by default) unroll all the loops before it passes the IR to Yosys, and the bitwidths are fairly large. I have a suggestions, but first like before, use the secret annotated version. If you start with one that already has the secret.generic, then the bufferization passes in the start of
If you run So for a faster compilation, without loop unrolling (which is part of the mlir-to-cggi pre-passes for now), I would want to run something like To be honest, this pipeline either expects smaller usecases or linalg matvecs for matrix-vector products. I'm happy to pick apart making this easier to compile. |
That is intended, actually. We won't compile yosys circuits with multiple outputs - maybe you can file a separate issue for that. There's also the |
For your variance example, change the scf loop to an affine loop, and it should work with the arith::DivSIOp printer added (will tag you in the PR):
I ran |
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748290487
Part of #1712 PiperOrigin-RevId: 748395020
Thank you for all this information @asraa, I really appreciate. Compiling large programs
What is the recommended way to compile large programs to CGGI then? Variance programOn the smaller variance example linked in this repo :
Matmul programCompiling a much smaller matmul program (attached below) fails. module {
func.func @matmul(%arg0: memref<10x10xi4>, %arg1: memref<10x10xi4>, %arg2: memref<10x10xi4>) attributes {llvm.linkage = #llvm.linkage<external>} {
affine.for %arg3 = 0 to 10 {
affine.for %arg4 = 0 to 10 {
affine.for %arg5 = 0 to 10 {
%0 = affine.load %arg0[%arg3, %arg5] : memref<10x10xi4>
%1 = arith.extsi %0 : i4 to i8
%2 = affine.load %arg1[%arg5, %arg4] : memref<10x10xi4>
%3 = arith.extsi %2 : i4 to i8
%4 = arith.muli %1, %3 : i8
%5 = affine.load %arg2[%arg3, %arg4] : memref<10x10xi4>
%6 = arith.trunci %4 : i8 to i4
%7 = arith.addi %5, %6 : i4
affine.store %7, %arg2[%arg3, %arg4] : memref<10x10xi4>
}
}
}
return
}
}
Things that work now!The loop example compiles now! Thank you. |
The main problem is that Yosys is slow to optimize large programs. We have a separate pipeline planned that, while compilation speed will be faster, the resulting program will be less performant. I'd hope in the long term we can make this tradeoff more finely controllable. The secondary problem is that CGGI is just poorly suited to large bit width arithmetic. I don't think anyone knows of a good way around this problem, though there is likely more we can do in the compiler if we wanted to port more of the smarts in Zama's tfhe-rs upstream |
Is there a way to compose MLIR |
Right now we already support something similar: running Yosys on the body of each secret. 7FC1 generic separately. IMO we haven't thought through a more comprehensive and clean design, say based on splitting apart a computation into separate functions (and letting the user annotate per function which method to use?) is that we haven't had a demand to scale to larger programs yet. |
Yeah, I can fix that.
I ran into that as well, and the issue it seemed in the pipeline is a problem with how secret-distribute-generic ends up forming the body to booleanize. It believes there is not return value since we don't return a new array, we just update the arg in place. I have it on my plate to fix that issue as I integrate CGGI into the frontend.
In addition to the non-Yosys pipeline that is currently in development (by @WoutLegiest), we also do have a secret.separator (search the tests for it) method that can force boundaries for Yosys compilation. But we haven't invested too much time in testing it because of how slow and unscalable Yosys is regardless. |
Runnig the Yosys optimizer pass (
--yosys-optimizer=mode=Boolean
) on this IR failsError:
The text was updated successfully, but these errors were encountered: