8000 WAW hazards elimination by cathales · Pull Request #2881 · openhwgroup/cva6 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

WAW hazards elimination #2881

New is 8000 sue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cathales
Copy link
Contributor
@cathales cathales commented Mar 28, 2025

This PR introduces a new RAW hazard detection mechanism to eliminate WAW hazards in CVA6 issue stage.

It first checks for hazards in all scoreboard entries in parallel.
Then it filters found hazards before vs after the current issue pointer.
It then finds the index of the last hazard before (resp. after) the issue pointer.
Finally, it gives precedence to a hazard before the issue pointer over the one after the issue pointer.

image

@cathales cathales marked this pull request as draft March 28, 2025 08:27
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor
@cfuguet cfuguet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. This is a nice contribution.

I was expecting the number of gates to increase because you are adding one priority encoder and some extra comparators to determine which entries are before/after the issue pointer. However, I was looking the Thales-CIs, and the number of gates actually decreased...

@cathales
Copy link
Contributor Author

Thanks César,

This new mechanism replaces the previous one and is more compact, hence the area reduction.

Cc @ricted98

@cfuguet
Copy link
Contributor
cfuguet commented Mar 28, 2025

Ok, but you agree with me that is kind of surprising... it is probably because there was a problem with the previous implementation. Otherwise, blocking the instruction issue during a WAW shall simplify the RAW logic.... You are sure that the newest value can come from either the RF or only one of the entries of the SB. No need for priority encoders or less or equal comparators for the issue pointer

@cathales
Copy link
Contributor Author

Especially, the previous implementation was processing dependence for each register from 0 to 31 then choosing one.
This implementation only processes dependence for required registers.

The reason why WAW dependence checking was needed was that it was not possible to disambiguate which scoreboard entry is the most recent one.
So to remove WAW dependence checking, we have to perform this disambiguation.
It is not enough to take scoreboard index because it is a circular buffer (it is what was performed, and because of the lack of disambiguation there was a requirement that no two scoreboard entries had the same rd, hence the WAW check required).

#2805 adds a rotation around the commit pointer to perform this disambiguation. The remainder of the logic is kept.

Here I decided to redesign the whole RAW dependence checking to be WAW-insensitive and reduce area for the reasons given in the first paragraph of this message.

So to more specifically answer your comment, the comparators and the priority encoders are needed here to perform disambiguation.

@cfuguet
Copy link
Contributor
cfuguet commented Mar 28, 2025

Yes, I understand why you need the priority encoders and the comparators for disambiguation for this new implementation.

However, I do not understand why the previous implementation needed to be that complex... Normally, as you can only have one possible candidate to bypass the data in case of RAW, then the logic shall be simpler. But it looks like it was not the case. That is why I was saying that, in my opinion, there was a problem with the previous implementation.

Anyway, the only thing we need to be careful with this new implementation is the timing. But I have not recently synthesized the CVA6 to know where is the critical path.

@cathales
Copy link
Contributor Author

The critical path did not involve this part of the pipeline in my synthesis. (It seemed to be related to the multiplier, in the execute stage)

@cathales cathales marked this pull request as ready for review April 18, 2025 09:42
@cathales
Copy link
Contributor Author

This new version makes renaming mandatory, thus removing the option from the CVA6 configuration.

It also fixes Linux boot (there was an issue with Vivado).
Thanks a lot to @ricted98 for #2805 that we used as a reference to fix Linux boot.


GitHub CI crashed but it seems to be unrelated:

Err:9 http://download.opensuse.org/repositories/home:/phiwag:/edatools/xUbuntu_20.04  InRelease
  The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BB95FA80F959D90B
Reading package lists...
W: GPG error: http://download.opensuse.org/repositories/home:/phiwag:/edatools/xUbuntu_20.04  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BB95FA80F959D90B
E: The repository 'http://download.opensuse.org/repositories/home:/phiwag:/edatools/xUbuntu_20.04  InRelease' is not signed.

Copy link
Contributor

❌ failed run, report available here.

1 similar comment
Copy link
Contributor

❌ failed run, report available here.

@cathales
Copy link
Contributor Author

2.6 kGates saved

Copy link
Contributor

✔️ successful run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

@JeanRochCoulon
Copy link
Contributor
JeanRochCoulon commented Apr 23, 2025

Many thanks to @ricted98 and @cathales for having taken time to compare the two Renaming solutions. After PPA compare, this solution has been selected to be merged into cva6.
Waiting for CI green light to merge...

@cfuguet
Copy link
Contributor
cfuguet commented Apr 23, 2025

Out of curiosity, I concur that this solution has a better area and good timing for a reduced number of entries in the scoreboard. But does it scale well when the number of entries increases ? I do not know if you made the test with for example 16 or 32 entries in the scoreboard. The timing of the priority encoder in this solution would degrade linearly with the number of entries. In the other solution, I would expect a log2 increase in timing (but still worse area).

@cathales
Copy link
Contributor Author

Hello,

We have not measured this.
With a naive implementation (idx = 0; for i in 0..n { if valid[i] { idx = i } }, timing would indeed be O(n), but if I understand rr_arb_tree correctly, it seems to be optimized to be O(log n).

I guess that even with the naive solution, the synthesis tool could probably optimize things to switch to the O(log n) solution if needed? This could be "only" balancing a binary tree.

Copy link
Contributor

❌ failed run, report available here.

2 similar comments
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

@cfuguet
Copy link
Contributor
cfuguet commented Apr 23, 2025

Ok, you are right 👍 It would be anyway interesting, to do different synthesis while increasing the number of entries in the SB to see when those paths become the critical path (or if it actually does). But not now anyway

It seems that since a few days ago the CI is having issues to access the opensuse server to get a Docker image ?

@JeanRochCoulon JeanRochCoulon merged commit c784fd9 into openhwgroup:master Apr 23, 2025
4 of 12 checks passed
@JeanRochCoulon JeanRochCoulon deleted the waw-elimination branch April 23, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0