-
Notifications
You must be signed in to change notification settings - Fork 59
[Profiler] Refactoring of VCD processing script(s) #2461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woop woop! Left some random notes at things that jumped out to me
tools/profiler/profiler-process/classes.py
Outdated
Show resolved
def __hash__(self): | ||
return hash((self.child_name, self.parents, self.child_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash
isn't safe to implement here because the class is mutable. I would double
check if this is actually required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True)
in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__
for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset
instead of an ordinary set
… can you get away with never needing register_new_parent
instead determining the entire parent set up front?
In general, I'd recommend going through and using frozen=True
on all your dataclasses, except where you think mutability is actually necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up not needing the __hash__
so I deleted it (I think previously I needed to check the equality of two ParChildType
s, or insert it as a dict key, but I'm no longer doing either of those things anymore). But this is a great thing to know and I updated the comment for register_new_parent()
to say that I should try getting frozen=True
. :)
In order to determine the entire parent set up front I think I'd need to maintain a dictionary from children to parents (because the original JSON file lists parent --> children) and construct ParChildType
at the very end using all that information. Would that be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I have left a range of low-level coding suggestions throughout, but this is obviously a big improvement and you should go ahead and merge once you've addressed a few of those suggestions from @EclecticGriffin and me.
Definitely for subsequent PRs and not for this one, I have a few higher-order suggestions for the next steps in refactoring:
- Consider trying to make more of your classes immutable. There is a common pattern here where a data-oriented class (e.g., the container for some trace data) needs a lot of mutation to get an instance fully "set up." Roughly speaking, the API looks like
foo = Foo() ; foo.add_bar(stuff) ; foo.add_baz(other_stuff) ; foo.now_look_things_up()
. One way to make these APIs a bit more manageable is to try to minimize—or fully eliminate—the surface area of the API that involves mutation, and instead construct stuff in a single shot. Again using my rough example above, the ideal situation would be that you just dofoo = Foo(stuff, other_stuff) ; foo.now_look_things_up()
. This is usually not easy to do, but if you can manage it nonetheless, it can be a great way to force yourself to simplify things and reduce the number of invariants you have to carefully maintain. - It seems like much of the cycle-level trace data is being stored in dictionaries, with the type
dict[int, Something]
where the key is a cycle number. It would be worth considering whether these could instead be replaced with dense lists. Even if not (for example, if the space of cycle keys has gaps in it), it could help a lot to invent yourself aTrace
class that wraps either a list or a dict and helps maintain the invariants that are common to any cycle-level trace data structure. - Throughout the code, there are a lot of small instances of logic that parses the strings for signal names. For example, it's common to see stuff like
signal.endswith(".go")
to check whether we have a "go" signal. I think we should invent aSignal
class to centralize this logic and endeavor to expunge all other special-purpose signal-name-string-handling code. For example, theSignal
class could havecell
andname
properties, so the above check could be simplified tosig.name == "go"
. Or, for special cases like this that come up repeatedly, maybe there should be a special property likesig.is_go
to make it even clearer. - Let's turn on Python type checking in CI. (The new hotness is Ty, so we could try that and fall back to plain ol' Mypy if there are too many problems.) This would be tempting to do now, in this PR, but TBQH I think it should be its own PR because it will probably turn up a lot of little things to fix.
Anyway, all four of those ideas are nontrivial amounts of work on their own (and not listed in any particular order), so please do defer them! Definitely no need to tackle them all at once.
def __hash__(self): | ||
return hash((self.child_name, self.parents, self.child_type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, if the class can be made immutable. Instead, use @dataclass(frozen=True)
in the decorator. It will (a) make this safe to put into dicts, and (b) automatically generate __hash__
for you so you don't need to write it yourself. To make this safe for this particular class, you will need to use a frozenset
instead of an ordinary set
… can you get away with never needing register_new_parent
instead determining the entire parent set up front?
In general, I'd recommend going through and using frozen=True
on all your dataclasses, except where you think mutability is actually necessary.
@sampsyo , I didn't realize that Astral was working on a type checker, that's awesome! |
But yes, making the types be happy should definitely be its own PR because the calyx-py stuff is riddled with type violations and I suspect the same is true of most of the other stuff. |
Thanks a lot @sampsyo !! Sounds good on the suggestions for further refactoring PRs, I'll get to them after I get more features/visualization issues that I wanted to fix down after this first PR. :) Will merge this PR for now since I think it's in a state where I can work on top of it again, and learned a lot about good code writing practices!! |
This PR contains my first big refactoring attempt to streamline the scripts that live in
tools/profiler/profiler-process
. My previous code was mostly a collection of ad-hoc dictionaries without any classes or typing that I had to do mental bookkeeping via comments to maintain. This PR adds:classes.py
: A series of classes to manage data bookkeeping.CellMetadata
contains mappings from components to cells and vice versa,ControlMetadata
contains data about TDCC-created FSMs and control groups (pars),TraceData
contains different versions of the produced trace, etc.errors.py
: A small file containing a class for Profiler-specific errors.The script also stopped producing the tree visualizations we had a while ago since we largely stopped using them, but I'm happy to bring them back if needed.
I'd really appreciate any feedback! It honestly might be better to see the current view of each script directly instead of the diff because I made too many changes though...