-
Notifications
You must be signed in to change notification settings - Fork 766
WAW hazards elimination #2881
New is 8000 sue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WAW hazards elimination #2881
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
❌ failed run, report available here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. This is a nice contribution.
I was expecting the number of gates to increase because you are adding one priority encoder and some extra comparators to determine which entries are before/after the issue pointer. However, I was looking the Thales-CIs, and the number of gates actually decreased...
Thanks César, This new mechanism replaces the previous one and is more compact, hence the area reduction. Cc @ricted98 |
Ok, but you agree with me that is kind of surprising... it is probably because there was a problem with the previous implementation. Otherwise, blocking the instruction issue during a WAW shall simplify the RAW logic.... You are sure that the newest value can come from either the RF or only one of the entries of the SB. No need for priority encoders or less or equal comparators for the issue pointer |
Especially, the previous implementation was processing dependence for each register from 0 to 31 then choosing one. The reason why WAW dependence checking was needed was that it was not possible to disambiguate which scoreboard entry is the most recent one. #2805 adds a rotation around the commit pointer to perform this disambiguation. The remainder of the logic is kept. Here I decided to redesign the whole RAW dependence checking to be WAW-insensitive and reduce area for the reasons given in the first paragraph of this message. So to more specifically answer your comment, the comparators and the priority encoders are needed here to perform disambiguation. |
Yes, I understand why you need the priority encoders and the comparators for disambiguation for this new implementation. However, I do not understand why the previous implementation needed to be that complex... Normally, as you can only have one possible candidate to bypass the data in case of RAW, then the logic shall be simpler. But it looks like it was not the case. That is why I was saying that, in my opinion, there was a problem with the previous implementation. Anyway, the only thing we need to be careful with this new implementation is the timing. But I have not recently synthesized the CVA6 to know where is the critical path. |
The critical path did not involve this part of the pipeline in my synthesis. (It seemed to be related to the multiplier, in the execute stage) |
This new version makes renaming mandatory, thus removing the option from the CVA6 configuration. It also fixes Linux boot (there was an issue with Vivado). GitHub CI crashed but it seems to be unrelated:
|
❌ failed run, report available here. |
1 similar comment
❌ failed run, report available here. |
2.6 kGates saved |
✔️ successful run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
Out of curiosity, I concur that this solution has a better area and good timing for a reduced number of entries in the scoreboard. But does it scale well when the number of entries increases ? I do not know if you made the test with for example 16 or 32 entries in the scoreboard. The timing of the priority encoder in this solution would degrade linearly with the number of entries. In the other solution, I would expect a log2 increase in timing (but still worse area). |
Hello, We have not measured this. I guess that even with the naive solution, the synthesis tool could probably optimize things to switch to the O(log n) solution if needed? This could be "only" balancing a binary tree. |
❌ failed run, report available here. |
Ok, you are right 👍 It would be anyway interesting, to do different synthesis while increasing the number of entries in the SB to see when those paths become the critical path (or if it actually does). But not now anyway It seems that since a few days ago the CI is having issues to access the opensuse server to get a Docker image ? |
This PR introduces a new RAW hazard detection mechanism to eliminate WAW hazards in CVA6 issue stage.
It first checks for hazards in all scoreboard entries in parallel.
Then it filters found hazards before vs after the current issue pointer.
It then finds the index of the last hazard before (resp. after) the issue pointer.
Finally, it gives precedence to a hazard before the issue pointer over the one after the issue pointer.