8000 PIFO Trees: Telemetry by anshumanmohan · Pull Request #1736 · calyxir/calyx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PIFO Trees: Telemetry #1736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Nov 16, 2023
Merged

PIFO Trees: Telemetry #1736

merged 45 commits into from
Nov 16, 2023

Conversation

anshumanmohan
Copy link
Contributor
@anshumanmohan anshumanmohan commented Oct 11, 2023

This is WIP and is not ready for serious review. I am just opening this PR so I can seek help getting it all the way there. This PR will eventually close #1697: the overall plan is exactly as summarized at the bottom of that thread.

Important background:

  • Recall that what we call "PIFO" in pifo.py in Calyx is actually a more general thing: it is simply a queue that orchestrates two sub-queues.
  • As part of this orchestration, it has the ability to divine by itself which flow a given input belongs to: 0 or 1. Based on the flow, it then pushes the input into one of its sub-queues. We'll call this the "top-level" flow inference to distinguish it from whatever further flow inference the sub-queues may go on to perform.
  • When a PIFO's sub-queues are simply FIFOs, we get simple PIFO behavior. When the sub-queues are themselves PIFOs, we get more expressive PIFO tree behavior. This can be extended forever, gaining height and therefore expressivity.

A summary of what I have done in this PR so far:

  1. The general PIFO-adding Python script now optionally takes as argument stats, which is a handle to a stats component. If passed a stats component, the PIFO component will, after each successful push, report the top-level flow of the recently pushed item to the stats component.
  2. The stats component is a little funny, and I am seeking your feedback re: its design. I currently have it like so:
    1. It takes two 1-bit inputs, report and flow.
    2. It maintains two registers, count_0 and count_1.
    3. If report is high, it also expects two further ref-registers. It drives the values of count_0 and count_1 to those ref-registers and is done. In this case the value of flow is ignored.
    4. If report is low, it actually computes statistics us 8000 ing flow: it increments the appropriate counter and is then done.
  3. The controller component (previously known as mother) invokes the stats component, passing a bogus value for flow, a raised flag for report, and cells into which it wants the stats to be written by reference.
  4. The overall design intent is this: when designing my PIFO tree, I will pass the top-level PIFO queue a stats component, but not pass any of its sub-queues stats components. In this way, our stats component will accumulate a coarse overall view of the top-level statistics of the tree: the very first branching, no more.

Further work is required all over:

  1. I'd like overall stylistic feedback on my Calyx.
  2. I'm looking for feedback re: the design of the stats component in particular. Is my use of inputs and refs reasonable? Should some things just be continuous or non-continuous outputs?
  3. I'm wondering how to make the controller component repeatedly poll the stats component instead of doing it just once.
  4. It would be nice (and not too hard) to give the controller the ability to zero-out the stats component. Maybe that can just be the default behavior upon report.
  5. It would also be nice (and maybe a bit tricky) to eke out the finest possible flow inference instead of the coarsest possible as I have it now. To do this, I would need each sub-queue to report to its caller what its flow inference was. The top-level PIFO, which has at hand a stats component, would then need to somehow digest this fine-grained info and report that to the stats component, not the coarse info that is has to hand.

@rachitnigam
Copy link
Contributor

Thanks for getting us started on this @anshumanmohan! What do you think about formalizing some of the details of the architecture in a system diagram? This would also give a head start for the figure needed in the paper and act as a nice at-a-glance reference for what the architecture is doing

@anshumanmohan
Copy link
Contributor Author
anshumanmohan commented Oct 11, 2023

Sure thing! Here's a quick sketch. It's actually an animation that I'll present when we meet, and hopefully it'll make more sense then.

Also, I spoke to @sampsyo and he suggested I can probably do some cleanup in the stats component: instead of an internal count_0 (resp. count_1) and a separate ref_count_0 (resp. ref_count_1) that I occasionally drive values onto, I can just maintain count_0 (resp. count_1) as an out port all along.

All solid rectangles are components. The triangle reflects the logical PIFO tree, but of course there isn't actually a PIFO tree component; you just talk to the PIFO tree by talking to root PIFO component. I've used dotted lines to represent the two .data arrays.
Screenshot 2023-10-11 at 10 32 26 AM

FIFOs are just boxes, and PIFOs are boxes with a few annotations. Here's what those mean:
Screenshot 2023-10-11 at 10 34 59 AM

@rachitnigam
Copy link
Contributor

Awesome, the visual is helpful but also leads me to a question: does each "managing queue" have access to its own stats component? If that is the case, is the controller accumulating this information and generating some form of cumulative statistics to report to the outside world?

@anshumanmohan
Copy link
Contributor Author

Right now only the root "managing queue" has a stats component, and it is not learning anything from its sub-queues about their flow-splitting decisions. This means that the (sole) stats component is only ever getting the most coarse-grained info, from the root.

This is an interesting space to explore, though. What I have above is a little unsatisfying. Two further options that will get us finer-grained info:

  1. We keep the setup where only the root talks to the stats component, but we make it so that the root queries its children about their flow-splitting decisions. Perhaps is not a query per se, but just an automatic report from any queue to its caller upon push. Then the root converts this info into some sort of compact scalar and reports that to the stats component.
  2. We change the setup, hooking up all the leaves of this tree to a stats component. These leaves can "just know" their place in the tree. So when a leaf tells the stats component, "Hey, I am leaf 001 (read, 'left, left, right') and I just accepted a packet", the stats component increments some special counter that is reserved for that leaf. Over time, this stats component accretes fine-grained information.

@anshumanmohan
Copy link
Contributor Author
anshumanmohan commented Nov 7, 2023

I think this will now pass the compiler tests, which is a step in the right direction. We're not done yet though.

@calebmkim I am wondering if I can get your eyes on something. Let's look at the Calyx file generated for now, which I've pushed temporarily.

My question is this: are you thinking that we do currently have the issue where two different copies of the static component get stamped out by controller and pifo_root?

  • It looks to me like the two uses are getting at the same stats component, but I could be wrong. To see what I'm saying, search for invoke stats in the file.
  • Although maybe I'm wrong and your warning is right: on searching for stats = stats(), one can believe that this is exactly where two discrete copies gets stamped out.

In case you are right, would you remind me what the fix was? I seem to recall you saying something about having one stats component defined at the main component, and then passing the component by reference? I have less experience with passing components by reference.

@calebmkim
Copy link
Contributor
calebmkim commented Nov 7, 2023

Hmmm, I still think there's the problem. If I'm understanding correctly, here's basically what's going on.

component stats (...) -> (...) {
  // define stats component 
} 
component pifo_root(...) -> (...) {
  cells {
    stats = stats() // 1st instantiation of stats 
    ... 
  } 
  // invoke stats() 
} 
component controller(...) -> (...) {
  cells {
    stats = stats() // 2nd instantiation of stats 
    ... 
  } 
  // invoke stats() 
} 

I think you only want one instantiation of stats() throughout the entire design; otherwise you're creating duplicates. Think of the cells section as instantiations of hardware: we could replace stats = stats() in the controller with stats_diff_name = stats(), resulting in the following controller

component controller(...) -> (...) {
  cells {
    stats_instance_diff_name = stats() // 2nd instantiation of stats 
    ... 
  } 
  // invoke stats_instance_diff_name() 
} 

This is the same program as before, but very clearly, there are two different(ly named) stats components.

Solution

Sadly the solution is probably going to be very ugly, since we have given up on passing components by reference (bc we can't invoke things because we need both to run for the exact same time). It seems like dataplane will only ever need to write to the stats component (through the pifo tree), while controller will only ever need to read from stats.

I think it would look something like this:

component main(...) -> (...) {
  cells {
    stats_instance = stats();  
    dataplane_instance = dataplane();
    controller_instance = controller(). 
  } 
  wires { 
    groups rtl_group {
       stats_instance.go = dataplane_instance.stats_instance_go;  
       stats_instance.flow = dataplane_instance.stats_instance_flow;  
       stats_instance.flow = dataplane_instance.stats_instance_report;  
       controller_instance.stats_instance_count0 = stats_instance.count0; //
       controller_instance.stats_instance_count1  = stats_instance.count1;  // 
       ...
    } 
  } 
}

component dataplane() -> (stats_instance_go: 1, stats_instance_flow: 1, stats_instance_report: 1) {
    cells {
        my_queue = pifo_root(); 
    } 
    control {
        // named ports differently for clarity.  
        invoke my_queue(...)(stats_instance_go = pifo_param_stats_instance_go, stats_instance_flow = pifo_param.....) 
    } 
} 

component pifo_root() -> (pifo_param_stats_instance_go: 1, pifo_param_stats_instance_flow: 1, pifo_param_stats_instance_report: 1) {
    ...
    group can_write_to_stats_now {
        pifo_param_stats_instance_go = xxx
        pifo_param_stats_instance_flow = xxx 
        // etc. 
    } 
} 

@anshumanmohan
Copy link
Contributor Author
anshumanmohan commented Nov 7, 2023

Thanks Caleb!

Re: why it's not working, I think I get it 100%. Those two instances of stats() are totally different, just like two different std_adds would have been different.

Re: the solution, let me try to say it at a high level first. Please nitpick me ruthlessly.

  1. In component main, you have defined the three players as cells. I'll add the three external memories here too, no problem.

  2. Also in main, you have this group where you've written a faux-RTL group. This is hooking up the stats component to the controller_instance and the dataplane_instance like a puppet: all of its in- and out-ports are now managed by one of those two other components, via some port of their own. So playing with stats_instance is just a matter of playing with the appropriate puppet-master ports in controller_instance or dataplane_instance.
    Question: I'm guessing we want this group to be constantly enabled?
    So the rest of the rtl_group (after your five lines) would have:

    • dataplane_instance.go = HI.
    • ... a whole buncha desugared stuff to pass memories by reference to dataplane, but without using invokes.
    • controller_instance.go = HI.
    • rtl_group.done = dataplane_instance.done.

    And main's control program would just be one thing: rtl_group.

  3. I'm a little confused by invoke my_queue(...)(stats_instance_go = pifo_param_stats_instance_go, stats_instance_flow = pifo_param.....). Is the point that:

    • Invoke the queue normally, passing inputs as usual (...).
    • But then when the invoke returns, capture its outputs in local variables. (stats_instance_go = pifo_param_stats_instance_go, stats_instance_flow = pifo_param.....). This precisely captures info about how the recently-returned queue would like to interact with the stats component.
    • And because the faux-RTL group in main is always enabled, capturing the queue's wishes in this way automatically communicates its wish to the stats component as well.
  4. And then in group can_write_to_stats_now you are showing me how to drive values to the queue's ports.

@calebmkim
Copy link
Contributor

1

Re: why it's not working, I think I get it 100%. Those two instances of stats() are totally different, just like two different std_adds would have been different.

Yep

2

Question: I'm guessing we want this group to be constantly enabled?

Yeah, and I think this kind of goes hand in hand with your other comment:

And main's control program would just be one thing: rtl_group.

Since main's control program is just one group ("faux-rtl"), that group will always be enabled.

3

But then when the invoke returns, capture its outputs in local variables. (stats_instance_go = pifo_param_stats_instance_go, stats_instance_flow = pifo_param.....). This precisely captures info about how the recently-returned queue would like to interact with the stats component.

To nit-pick, this isn't necessarily "when the invoke returns".
Idk if you know this already, but invokes are just compiled into groups: see this file and its corresponding .expect file.

So what we're really saying is that, for the entire time the invoke is running (i.e., when my_queue's control is executing), anytime my_queue writes to its output parameter pifo_param_stats_instance_go, that value will be written to dataplane's output parameter stats_instance_go, which (in main) is hooked up to stats_instance.go, hence triggering an execution of stats_instance. Does that make sense?

4

And then in group can_write_to_stats_now you are showing me how to drive values to the queue's ports.

Yeah basically.

5

One last thing: it seems like controller doesn't even have to invoke the stats component. If all it has to do is read from stats.count0 and stats.count1 (which are assigned to in continuous assignments and therefore can be read from anytime, even when stats is not active)., the controller can just periodically poll by reading from stats.count0 and stats.count1. Does that make sense (I could be missing something tho)?

@anshumanmohan
Copy link
Contributor Author

This all sounds great, Caleb! Thanks!! I'll get to it, and we'll see what comes up.

anshumanmohan and others added 2 commits November 8, 2023 08:40
This works for now because the controller is not looping forever.
It soon WILL, and then we'll need to do some RTL-like coding to make sure that the par block is guarded by the dataplane and not the controller
@anshumanmohan
Copy link
Contributor Author

This is now working! I have not implemented the approach that Caleb and I shook on just above, opting instead for a lighter-weight version that @sampsyo and I discussed.

The high-level intent earlier was for main to call controller and dataplane in parallel and watch for dataplane to be done. This is tricky and is being discussed in its own issue. Trying to do it here would have required us to go to a low-level RTL style and there is much pain there. See above.

We have instead opted for a style where main just runs a while loop, conditioned on a kill signal from dataplane, and inside the whole loop it allows dataplane and controller to do quick pieces of work.

To make this work, I had to adapt dataplane to perform its work in this piecemeal way. That component is used to just having a list of commands and running free with it. Now it still has the list of commands, but on every invocation is processes one more command. When it is done with all the commands, or when running a command returns an error, the dataplane raises its kill signal and the main loop respects it by ending the whole program.

The eDSL code is ugly as sin, in large part because it still does lots of ugly things that were (I think) necessary in the "batch mode" dataplane. I will clean that up shortly. For now I just wanted to mark this PR as ready for review.

@anshumanmohan anshumanmohan marked this pull request as ready for review November 9, 2023 20:11
@anshumanmohan
Copy link
Contributor Author

I have pushed the .yx file generated by the eDSL just for ease of review. I will remove it afterwards, and before merging. The point of pushing it is to demonstrate that we are no longer instantiating multiple copies of stats; we are instantiating one and passing it around by refernece.

@anshumanmohan
Copy link
Contributor Author

I have tidied up the eDSL code some, which is nice at the Python level and has led to a leaner hardware design. Kinda fun to see the changes here. This is now ready for review.

I would like to catch the other queues up to this nicer style, but that's going to be a separate issue.

@anshumanmohan anshumanmohan changed the title [PIFO Trees] Telemetry PIFO Trees: Telemetry Nov 16, 2023
@anshumanmohan anshumanmohan enabled auto-merge (squash) November 16, 2023 23:08
@anshumanmohan anshumanmohan merged commit 4fe0a9f into master Nov 16, 2023
@anshumanmohan anshumanmohan deleted the pifo-tree-telemetry branch November 16, 2023 23:23
rachitnigam pushed a commit that referenced this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Next Steps for PIFO Trees
3 participants
0