[Profiler] Refactoring of VCD processing script(s) #2461

ayakayorihiro · 2025-05-20T21:49:29Z

This PR contains my first big refactoring attempt to streamline the scripts that live in tools/profiler/profiler-process. My previous code was mostly a collection of ad-hoc dictionaries without any classes or typing that I had to do mental bookkeeping via comments to maintain. This PR adds:

classes.py: A series of classes to manage data bookkeeping. CellMetadata contains mappings from components to cells and vice versa, ControlMetadata contains data about TDCC-created FSMs and control groups (pars), TraceData contains different versions of the produced trace, etc.
errors.py: A small file containing a class for Profiler-specific errors.
Use of argparse instead of directly parsing command line arguments.
Adjustments to the fud2 profiler script to use new argument options.

The script also stopped producing the tree visualizations we had a while ago since we largely stopped using them, but I'm happy to bring them back if needed.

I'd really appreciate any feedback! It honestly might be better to see the current view of each script directly instead of the diff because I made too many changes though...

EclecticGriffin

Woop woop! Left some random notes at things that jumped out to me

tools/profiler/profiler-process/classes.py

EclecticGriffin · 2025-05-21T18:58:41Z

tools/profiler/profiler-process/classes.py

+    def __hash__(self):
+        return hash((self.child_name, self.parents, self.child_type))


hash isn't safe to implement here because the class is mutable. I would double
check if this is actually required

It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True) in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__ for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset instead of an ordinary set… can you get away with never needing register_new_parent instead determining the entire parent set up front?

In general, I'd recommend going through and using frozen=True on all your dataclasses, except where you think mutability is actually necessary.

I ended up not needing the __hash__ so I deleted it (I think previously I needed to check the equality of two ParChildTypes, or insert it as a dict key, but I'm no longer doing either of those things anymore). But this is a great thing to know and I updated the comment for register_new_parent() to say that I should try getting frozen=True. :)

In order to determine the entire parent set up front I think I'd need to maintain a dictionary from children to parents (because the original JSON file lists parent --> children) and construct ParChildType at the very end using all that information. Would that be ok?

tools/profiler/profiler-process/construct_trace.py

tools/profiler/profiler-process/visuals/timeline.py

tools/profiler/profiler-process/classes.py

sampsyo

Looking great! I have left a range of low-level coding suggestions throughout, but this is obviously a big improvement and you should go ahead and merge once you've addressed a few of those suggestions from @EclecticGriffin and me.

Definitely for subsequent PRs and not for this one, I have a few higher-order suggestions for the next steps in refactoring:

Consider trying to make more of your classes immutable. There is a common pattern here where a data-oriented class (e.g., the container for some trace data) needs a lot of mutation to get an instance fully "set up." Roughly speaking, the API looks like foo = Foo() ; foo.add_bar(stuff) ; foo.add_baz(other_stuff) ; foo.now_look_things_up(). One way to make these APIs a bit more manageable is to try to minimize—or fully eliminate—the surface area of the API that involves mutation, and instead construct stuff in a single shot. Again using my rough example above, the ideal situation would be that you just do foo = Foo(stuff, other_stuff) ; foo.now_look_things_up(). This is usually not easy to do, but if you can manage it nonetheless, it can be a great way to force yourself to simplify things and reduce the number of invariants you have to carefully maintain.
It seems like much of the cycle-level trace data is being stored in dictionaries, with the type dict[int, Something] where the key is a cycle number. It would be worth considering whether these could instead be replaced with dense lists. Even if not (for example, if the space of cycle keys has gaps in it), it could help a lot to invent yourself a Trace class that wraps either a list or a dict and helps maintain the invariants that are common to any cycle-level trace data structure.
Throughout the code, there are a lot of small instances of logic that parses the strings for signal names. For example, it's common to see stuff like signal.endswith(".go") to check whether we have a "go" signal. I think we should invent a Signal class to centralize this logic and endeavor to expunge all other special-purpose signal-name-string-handling code. For example, the Signal class could have cell and name properties, so the above check could be simplified to sig.name == "go". Or, for special cases like this that come up repeatedly, maybe there should be a special property like sig.is_go to make it even clearer.
Let's turn on Python type checking in CI. (The new hotness is Ty, so we could try that and fall back to plain ol' Mypy if there are too many problems.) This would be tempting to do now, in this PR, but TBQH I think it should be its own PR because it will probably turn up a lot of little things to fix.

Anyway, all four of those ideas are nontrivial amounts of work on their own (and not listed in any particular order), so please do defer them! Definitely no need to tackle them all at once.

tools/profiler/profiler-process/classes.py

sampsyo · 2025-05-23T16:48:30Z

tools/profiler/profiler-process/classes.py

+    def __hash__(self):
+        return hash((self.child_name, self.parents, self.child_type))


It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True) in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__ for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset instead of an ordinary set… can you get away with never needing register_new_parent instead determining the entire parent set up front?

In general, I'd recommend going through and using frozen=True on all your dataclasses, except where you think mutability is actually necessary.

tools/profiler/profiler-process/classes.py

tools/profiler/profiler-process/construct_trace.py

EclecticGriffin · 2025-05-23T18:03:52Z

@sampsyo , I didn't realize that Astral was working on a type checker, that's awesome!

EclecticGriffin · 2025-05-23T18:05:26Z

But yes, making the types be happy should definitely be its own PR because the calyx-py stuff is riddled with type violations and I suspect the same is true of most of the other stuff.

ayakayorihiro · 2025-05-28T15:40:56Z

Thanks a lot @sampsyo !! Sounds good on the suggestions for further refactoring PRs, I'll get to them after I get more features/visualization issues that I wanted to fix down after this first PR. :)

Will merge this PR for now since I think it's in a state where I can work on top of it again, and learned a lot about good code writing practices!!

ayakayorihiro added 30 commits April 21, 2025 18:51

Moving doc comments to the right place

70c9821

using argparse to get arguments

676445e

Creating new exception class and using it instead of sys.exit(1)

e2abad3

WIP; so much data

f5d5b45

more WIP, starting to get out of hand

7d818a3

more WIP

9b0cd55

clean some stack stuff

b3a604c

WIP but I am starting to see more of the light

81bfc78

wip

44f8969

commit unsaved changes

222a85a

wip

4338d5c

first pass on up to adding pars before I fix things

e9302aa

fixed some errors

35f60e1

progress

5d52377

fixed some errors to produce trace info; need to verify

90fec0d

print main cell differently

119a3b5

monday progress. cell-stats csv is still broken but end is near

d1cf002

Fixed cell stats csv

6864d05

Got timeline view

8f0e36b

WIP. Things are still broken...

d86f163

Fixed timeline bugs

7189114

format

42e12fd

I thought I got ADL maps but now everything is broken

77347ee

fixed bug

1bff8ba

Fix bug wrt adding control groups to trace

fc060d9

print threshold option

8b9a793

Fixed some cell stats bugs

752fdff

reorganize main

17937e4

Fix formatting and add some documentation

c2222ff

Update fud2 tests

9061b39

ayakayorihiro added 4 commits May 20, 2025 17:10

Merge branch 'main' into profiler/script-refactor

b832962

try making ruff happy

faff4a4

fix fud2 test

e30ac16

remove diff snapshot text

3cf1ca1

ayakayorihiro requested a review from EclecticGriffin May 20, 2025 21:49

ayakayorihiro self-assigned this May 20, 2025

ayakayorihiro added the C: calyx-profiler Profiling Calyx programs label May 20, 2025

EclecticGriffin reviewed May 21, 2025

View reviewed changes

sampsyo reviewed May 23, 2025

View reviewed changes

ayakayorihiro added 13 commits May 25, 2025 21:31

some small updates from PR comments

34eedcd

some more PR comments

f1ddb4f

remove old script that was readded

ce0125f

ParChildInfo bug fixed

ac8083a

Fixed things based on most of Griffin's comments. Fixed bugs

9518a52

some more PR comments

26258b0

Removing reverse mapping of CellMetadata.component_to_cells

cdddb90

Some more changes

4461ba5

progress

3c0473f

one round of refactoring create_cycle_trace()

2f778f5

Add docs and reorder function calls

2e641de

Refactoring create_trace_with_control_groups

970c3b6

More small fixes and formatting

ec798ea

ayakayorihiro merged commit cb7c0c8 into main May 28, 2025
18 checks passed

ayakayorihiro deleted the profiler/script-refactor branch May 28, 2025 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Profiler] Refactoring of VCD processing script(s) #2461

[Profiler] Refactoring of VCD processing script(s) #2461

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		def __hash__(self):
		return hash((self.child_name, self.parents, self.child_type))

[Profiler] Refactoring of VCD processing script(s) #2461

[Profiler] Refactoring of VCD processing script(s) #2461

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!