-
Notifications
You must be signed in to change notification settings - Fork 77
Loop support: prefer linalg
over generic affine.for
#1569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
ilnalg
over generic affine.for
linalg
over generic affine.for
+1 on this too. I noticed in the MLP example when I had the for loops in, the noise would grow more because of the iterative rotation by 1. It would be nicer to be able to lower and implement the rotations in a custom way (e.g. by the initial + 1, initial + 2, initial + 3, etc for minimal noise growth). |
After discussing this in office hours, I think we came to the following agreement:
For (2) we talked about the idea of making a custom pass that handles The simplest version of this pass would require the ciphertext dimension to be the same at the end of each loop iteration, and would require a bootstrap so that each iteration starts with freshly reset noise. A more sophisticated version could do two additional things: (a) determine how many initial loop iterations need to be peeled from the beginning or end of the loop when the invariant doesn't need to be maintained (e.g., this would solve the case of initialization of a loop carried dependency by a plaintext) and (b) determine how much to unroll a loop so that the invariants can be spread out more; e.g., you might only need to bootstrap after every three loop iterations. |
This issue has 2 outstanding TODOs:
This comment was autogenerated by todo-backlinks |
This issue has 2 outstanding TODOs:
This comment was autogenerated by todo-backlinks |
HEIR has supported
affine.for
, but in the eyes of dataflow analysis, genericaffine.for
with arbitrary body is hard to understand (understanding loop has always been one of the hard part of compiler). I propose usinglinalg
to describe what we have now.I further argue (not 100% sure) that any
affine.for
that could be understood/accepted by FHE-specific analysis can always be rewritten inlinalg
with primitives likeelementwise
/reduce
, as the task of the analysis is to understand what part ofaffine.for
's body is elementwise and how the loop-carried variable forms a reduction. The is somewhat similar to the map-reduce programming model.affine.for
usageSemantically there are two places we use
affine.for
Data semantic
For data semantic, we already need to mitigate it in some way.
linalg
.So the problem reduces to loop support in ciphertext semantic.
Common ciphertext-semantic patterns in current HEIR codebase
There are quite a lot of pattern like
or
that is actually elementwise op in input + reduction on the loop carried variable. Instead of letting the lowering emit such fused loop and let analysis divide them apart, we can divide them apart early like
Case study for the matmul kernel
The halevi-shoup lowering should lower from data-semantic linalg.matmul to cipher-text-semantic combinations of
linalg.some_op
.The current lowering in main is like
The elementwise part of it is clearly
mulf
(with tensor splatting), and reduction isaddf
, but therotate
part is quite hard to understand. In analysis we have to write an individual rule for it to know that it is progressive rotation, and NoiseAnalysis is quite unhappy with it, see #1517 (comment)Instead we could have
And the
%rotated_tensor_of_input
can be produced in many ways like 8-4-2-1 or BSGS (32-16-8-4/3,2,1) way instead of rotate by each index / progressively rotate.Analysis requirement
There are these following types of analysis in HEIR FHE world that must properly handle loop to have correct lowering
With
linalg
clearly stating the batch behavior instead of genericaffine.for
, the lives of these analyses could be much easier. Otherwise each of them need to separate the loop into different semantic part and understand it (quick reflection: how could analysis understand therotate %c1
in halevi-shoup kernel above)Another point that supports
linalg
is that, asif
is hard to analyse, we already have convert-if-to-select so it becomes something like.filter
in map-reduce language. And it can be expressed inlinalg.elementwise
.Discussion on additional cost
Separate them forms much bigger memory cost, as what map-reduce would do. For now all the partial results are stored instead of being reduced immediately.
We should seek optimization from the loop-transformation world like fusion/tiling, after the FHE lowering. I think this is quite standard way like semantic first and optimization later instead of writing an optimized
affine.for
early.Discussion on lowering
We eventually need to lower it. I think that is at the boundary of the backend where linalg is expressed in c++ for.
Discussion on frontend
If such reasoning above is acceptable, we might encourage frontend user / kernel writer to use map-reduce-ish language / functional programming way to express things instead of for loop as the former would be much semantically clear.
The text was updated successfully, but these errors were encountered: