8000 [Profiler] Refactoring of VCD processing script(s) by ayakayorihiro · Pull Request #2461 · calyxir/calyx · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Profiler] Refactoring of VCD processing script(s) #2461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
May 28, 2025

Conversation

ayakayorihiro
Copy link
Contributor
@ayakayorihiro ayakayorihiro commented May 20, 2025

This PR contains my first big refactoring attempt to streamline the scripts that live in tools/profiler/profiler-process. My previous code was mostly a collection of ad-hoc dictionaries without any classes or typing that I had to do mental bookkeeping via comments to maintain. This PR adds:

  • classes.py: A series of classes to manage data bookkeeping. CellMetadata contains mappings from components to cells and vice versa, ControlMetadata contains data about TDCC-created FSMs and control groups (pars), TraceData contains different versions of the produced trace, etc.
  • errors.py: A small file containing a class for Profiler-specific errors.
  • Use of argparse instead of directly parsing command line arguments.
  • Adjustments to the fud2 profiler script to use new argument options.

The script also stopped producing the tree visualizations we had a while ago since we largely stopped using them, but I'm happy to bring them back if needed.

I'd really appreciate any feedback! It honestly might be better to see the current view of each script directly instead of the diff because I made too many changes though...

@ayakayorihiro ayakayorihiro self-assigned this May 20, 2025
@ayakayorihiro ayakayorihiro added the C: calyx-profiler Profiling Calyx programs label May 20, 2025
Copy link
Collaborator
@EclecticGriffin EclecticGriffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woop woop! Left some random notes at things that jumped out to me

Comment on lines 130 to 131
def __hash__(self):
return hash((self.child_name, self.parents, self.child_type))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash isn't safe to implement here because the class is mutable. I would double
check if this is actually required

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True) in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__ for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset instead of an ordinary set… can you get away with never needing register_new_parent instead determining the entire parent set up front?

In general, I'd recommend going through and using frozen=True on all your dataclasses, except where you think mutability is actually necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up not needing the __hash__ so I deleted it (I think previously I needed to check the equality of two ParChildTypes, or insert it as a dict key, but I'm no longer doing either of those things anymore). But this is a great thing to know and I updated the comment for register_new_parent() to say that I should try getting frozen=True. :)

In order to determine the entire parent set up front I think I'd need to maintain a dictionary from children to parents (because the original JSON file lists parent --> children) and construct ParChildType at the very end using all that information. Would that be ok?

Copy link
Contributor
@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! I have left a range of low-level coding suggestions throughout, but this is obviously a big improvement and you should go ahead and merge once you've addressed a few of those suggestions from @EclecticGriffin and me.

Definitely for subsequent PRs and not for this one, I have a few higher-order suggestions for the next steps in refactoring:

  • Consider trying to make more of your classes immutable. There is a common pattern here where a data-oriented class (e.g., the container for some trace data) needs a lot of mutation to get an instance fully "set up." Roughly speaking, the API looks like foo = Foo() ; foo.add_bar(stuff) ; foo.add_baz(other_stuff) ; foo.now_look_things_up(). One way to make these APIs a bit more manageable is to try to minimize—or fully eliminate—the surface area of the API that involves mutation, and instead construct stuff in a single shot. Again using my rough example above, the ideal situation would be that you just do foo = Foo(stuff, other_stuff) ; foo.now_look_things_up(). This is usually not easy to do, but if you can manage it nonetheless, it can be a great way to force yourself to simplify things and reduce the number of invariants you have to carefully maintain.
  • It seems like much of the cycle-level trace data is being stored in dictionaries, with the type dict[int, Something] where the key is a cycle number. It would be worth considering whether these could instead be replaced with dense lists. Even if not (for example, if the space of cycle keys has gaps in it), it could help a lot to invent yourself a Trace class that wraps either a list or a dict and helps maintain the invariants that are common to any cycle-level trace data structure.
  • Throughout the code, there are a lot of small instances of logic that parses the strings for signal names. For example, it's common to see stuff like signal.endswith(".go") to check whether we have a "go" signal. I think we should invent a Signal class to centralize this logic and endeavor to expunge all other special-purpose signal-name-string-handling code. For example, the Signal class could have cell and name properties, so the above check could be simplified to sig.name == "go". Or, for special cases like this that come up repeatedly, maybe there should be a special property like sig.is_go to make it even clearer.
  • Let's turn on Python type checking in CI. (The new hotness is Ty, so we could try that and fall back to plain ol' Mypy if there are too many problems.) This would be tempting to do now, in this PR, but TBQH I think it should be its own PR because it will probably turn up a lot of little things to fix.

Anyway, all four of those ideas are nontrivial amounts of work on their own (and not listed in any particular order), so please do defer them! Definitely no need to tackle them all at once.

Comment on lines 130 to 131
def __hash__(self):
return hash((self.child_name, self.parents, self.child_type))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True) in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__ for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset instead of an ordinary set… can you get away with never needing register_new_parent instead determining the entire parent set up front?

In general, I'd recommend going through and using frozen=True on all your dataclasses, except where you think mutability is actually necessary.

@EclecticGriffin
Copy link
Collaborator

@sampsyo , I didn't realize that Astral was working on a type checker, that's awesome!

@EclecticGriffin
Copy link
Collaborator

But yes, making the types be happy should definitely be its own PR because the calyx-py stuff is riddled with type violations and I suspect the same is true of most of the other stuff.

@ayakayorihiro
Copy link
Contributor Author

Thanks a lot @sampsyo !! Sounds good on the suggestions for further refactoring PRs, I'll get to them after I get more features/visualization issues that I wanted to fix down after this first PR. :)

Will merge this PR for now since I think it's in a state where I can work on top of it again, and learned a lot about good code writing practices!!

@ayakayorihiro ayakayorihiro merged commit cb7c0c8 into main May 28, 2025
18 checks passed
@ayakayorihiro ayakayorihiro deleted the profiler/script-refactor branch May 28, 2025 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: calyx-profiler Profiling Calyx programs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0