8000 perf: Make mempool update async from block.Commit by ValarDragon · Pull Request #3008 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

perf: Make mempool update async from block.Commit #3008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 22, 2024

Conversation

ValarDragon
Copy link
Contributor
@ValarDragon ValarDragon commented May 6, 2024

First step to fixing #2925

PR'ing this to see if we have any test failures. Note that this is safe in the happy path, as Reap and CheckTx both share this same lock. The functionality behavior is that:

  • Full nodes and non-proposers timeout_prevote beginning should not block on updating the mempool
  • Block proposers get very slight increased concurrency before reaping their next block. (Should be significantly fixed in subsequent PR's in Mempool Rechecking all txs blocks consensus #2925)
    • Reap takes a lock on the mempool mutex, so there is no concurrency safety issues right now.
  • Mempool errors will not halt consensus, instead they just log an error and call mempool flush. I actually think this may be better behavior? If we want to preserve the old behavior, we can thread a generic "consensus halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if tests need creating. Seems like the create empty block tests sometimes hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to help us with performance improvements. Happy to get this into an experimental release to test on mainnets.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@ValarDragon ValarDragon requested review from a team as code owners May 6, 2024 00:34
@ValarDragon ValarDragon marked this pull request as draft May 6, 2024 00:37
@melekes melekes requested a review from hvanz May 6, 2024 07:08
@ValarDragon ValarDragon marked this pull request as ready for review May 6, 2024 07:56
return res.RetainHeight, err
if err != nil {
blockExec.logger.Error("client error during mempool.Update, flushing mempool", "err", err)
blockExec.mempool.Flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is new, right? Why it's important to flush the mempool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we would halt, sorry forgot to flag in a comment. I feel like we only have two options, dump the mempool or replicate existing functionality and make an error channel to halt consensus

Copy link
Contributor
@sergio-mena sergio-mena May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any caller of function Commit in line 395 above will panic if Commit doesn't return a nil error. This is for safety and has been like that for years. I'd feel more comfortable if we didn't modify that behavior here, so if err != nil in line 445, I'd panic here with the error text of line 446.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@melekes melekes added the backport-to-v1.x Tell Mergify to backport the PR to v1.x label May 7, 2024
@cason
Copy link
Contributor
cason commented May 10, 2024

Full nodes and non-proposers timeout_prevote beginning should not block on updating the mempool

What do you mean? In any case, it is probably timeout_commit.

@cason
Copy link
Contributor
cason commented May 10, 2024

One problem of this approach is that this breaks the contract that we have regarding the ABCI Commit method:

https://docs.cometbft.com/v0.38/spec/abci/abci++_app_requirements#commit

Before invoking Commit, CometBFT locks the mempool and flushes the mempool connection. This ensures that no new messages will be received on the mempool connection during this processing step, providing an opportunity to safely update all four connection states to the latest committed state at the same time.

When Commit returns, CometBFT unlocks the mempool.

By updating the mempool in parallel, this contract is clearly broken.

cason
cason previously requested changes May 10, 2024
Copy link
Contributor
@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to block this change while we address how we are handling the contract with the application as part of the ABCI Commit call.

if updateData.err != nil {
return
}

err = blockExec.mempool.Update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part we can do in parallel.

Once the new state is committed, the application can start performing the re-check of all transactions remaining in the mempool, using the updated state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, it is not clear to me why we need to call Update with a locked mempool...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this is due to the current FIFO guarantee, before you add a new tx. (You want to know your done rechecking, before you add more txs to the end)

This really should be a lock owned within the mempool though, rather than here.

@ValarDragon
Copy link
Contributor Author

One problem of this approach is that this breaks the contract that we have regarding the ABCI Commit method:

https://docs.cometbft.com/v0.38/spec/abci/abci++_app_requirements#commit

Before invoking Commit, CometBFT locks the mempool and flushes the mempool connection. This ensures that no new messages will be received on the mempool connection during this processing step, providing an opportunity to safely update all four connection states to the latest committed state at the same time.
When Commit returns, CometBFT unlocks the mempool.

By updating the mempool in parallel, this contract is clearly broken.

After your fix of flushing the mempool before Commit, we are only breaking the contract in "When Commit returns, CometBFT unlocks the mempool.". We do flush and take a lock during Commit. We just now hold the lock after Commit terminates. I'm not sure that application behavior would depend on this.

@ValarDragon
Copy link
Contributor Author

I'm still stuck on the create empty blocks tests. Going to try more tomorrow. But somehow playing around with the timeouts / expanding leniency isn't fixing this. (Testing via running go test -timeout 30s -run ^TestMempoolProgressAfterCreateEmptyBlocksInterval$ github.com/cometbft/cometbft/internal/consensus -count=100 -p=1)

Apriori I'd just guess maybe we shaved 10ms or something in this configuration in edge cases. Going to try more with a debugger tmrw.

@melekes
Copy link
Contributor
melekes commented May 14, 2024

@hvanz could you please help @ValarDragon here?

Copy link
Member
@hvanz hvanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I've also run the tests and they all pass.

Regarding the ABCI spec on Commit, I don't think this change will pose a problem. The important part of that paragraph is "Before invoking Commit, CometBFT locks the mempool and flushes the mempool connection.". This is not broken.

But the line "When Commit returns, CometBFT unlocks the mempool." should be updated to reflect the new behaviour. I think it doesn't matter for the ABCI connections that the mempool is still locked after Commit returns.

@ValarDragon
Copy link
Contributor Author
ValarDragon commented May 17, 2024

I think the failing test cases is due to some race condition bug in the timeout ticker relating to ticks with negative durations.

I added some logs, and in the passing cases I see logs like:

Received tick old_ti {40ms 1 0 RoundStepPropose} new_ti {-142.018µs 2 0 RoundStepNewHeight}
1715938581269 Scheduled timeout dur -142.018µs height 2 round 0 step RoundStepNewHeight
Received tick old_ti {-142.018µs 2 0 RoundStepNewHeight} new_ti {200ms 2 0 RoundStepNewRound}
1715938581269 Scheduled timeout dur 200ms height 2 round 0 step RoundStepNewRound
1715938581469 Timed out dur 200ms height 2 round 0 step RoundStepNewRound
1715938581469 handle timeout, timeout type {200ms 2 0 RoundStepNewRound}

where the left number is time.Now().UnixMilli()

In the failing test cases I see:

Received tick old_ti {40ms 1 0 RoundStepPropose} new_ti {-161.505µs 2 0 RoundStepNewHeight}
1715938583735 Scheduled timeout dur -161.505µs height 2 round 0 step RoundStepNewHeight
Received tick old_ti {-161.505µs 2 0 RoundStepNewHeight} new_ti {200ms 2 0 RoundStepNewRound}
Timer already stopped
1715938583735 Scheduled timeout dur 200ms height 2 round 0 step RoundStepNewRound

(So were not getting the 200ms wait)

Trying to figure out whats the cause of this edge case. I think this has to be independent of my change though.

@ValarDragon
Copy link
Contributor Author
ValarDragon commented May 17, 2024

Ok just confirmed I get this same failure on main when I run the test in a loop, so its not new to this PR (and is an edge case failure in the timeout ticker)

Going to file this into an issue, and consider the test failure here in this PR (vs not main) as a fluke. Will try to fix that separately in another PR.

EDIT: Filed into issue / PR, #3091 / #3092 . I think that should not block this PR, as this behavior exists in the same manner. Or perhaps we merge that in first, then rebase into this PR?

@ValarDragon ValarDragon requested a review from a team as a code owner May 17, 2024 12:58
@sergio-mena
Copy link
Contributor

Or perhaps we merge that in first, then rebase into this PR?

I favour that approach

state State
abciResponse *abci.FinalizeBlockResponse
}

// Commit locks the mempool, runs the ABCI Commit message, and updates the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update this comment?

Co-authored-by: Sergio Mena <sergio@informal.systems>
@sergio-mena sergio-mena added this pull request to the merge queue May 22, 2024
Merged via the queue into cometbft:main with commit 1c277c0 May 22, 2024
36 checks passed
PaddyMc pushed a commit to osmosis-labs/cometbft that referenced this pull request May 23, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
PaddyMc added a commit to osmosis-labs/cometbft that referenced this pull request May 23, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
mergify bot pushed a commit to osmosis-labs/cometbft that referenced this pull request May 23, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
(cherry picked from commit 2cea495)
PaddyMc added a commit to osmosis-labs/cometbft that referenced this pull request May 23, 2024
… (#71)

First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
(cherry picked from commit 2cea495)

Co-authored-by: PaddyMc <paddymchale@hotmail.com>
10000
@faddat
Copy link
Contributor
faddat commented May 27, 2024

I just want to say that this actually, if it works how I think it will work, looks like it may be a very significant piece of solving the P2P storms puzzle.

beer-1 pushed a commit to initia-labs/cometbft that referenced this pull request May 29, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
albertchon added a commit to InjectiveLabs/cometbft that referenced this pull request May 30, 2024
@sergio-mena sergio-mena added mempool bug Something isn't working labels Jun 28, 2024
@sergio-mena
Copy link
Contributor

An update regarding this.

We have finally decided to backport this to v1.x because:

  • Our e2e nightlies are failing, and that is blocking our process for cutting v1.x RC. We are convinced this is part of the fix for those failing tests. So I added the "bug" label
  • @ValarDragon reported that this fix has been in production on Osmosis for a few weeks now without issue.

So, benefits clearly outweigh the risks at this point.

@sergio-mena sergio-mena added the backport-to-v1.x Tell Mergify to backport the PR to v1.x label Jun 28, 2024
mergify bot pushed a commit that referenced this pull request Jun 28, 2024
First step to fixing #2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
#2925)
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
(cherry picked from commit 1c277c0)

# Conflicts:
#	state/execution.go
sergio-mena added a commit that referenced this pull request Jun 28, 2024
sergio-mena added a commit that referenced this pull request Jun 28, 2024
First step to fixing #2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
sergio-mena added a commit that referenced this pull request Jun 28, 2024
…3362)

First step to fixing #2925 

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
#2925)
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

#### PR checklist

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #3008 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
PaddyMc pushed a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
ValarDragon added a commit to osmosis-labs/cometbft that referenced this pull request Aug 19, 2024
First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
yihuang pushed a commit to yihuang/cometbft that referenced this pull request Sep 13, 2024
…3008) (cometbft#3362)

First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Ha
103B5
ppy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3008 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
yihuang pushed a commit to yihuang/cometbft that referenced this pull request Sep 13, 2024
…3008) (cometbft#3362)

First step to fixing cometbft#2925

PR'ing this to see if we have any test failures. Note that this is safe
in the happy path, as Reap and CheckTx both share this same lock. The
functionality behavior is that:

- Full nodes and non-proposers `timeout_prevote` beginning should not
block on updating the mempool
- Block proposers get _very slight_ increased concurrency before reaping
their next block. (Should be significantly fixed in subsequent PR's in
- Reap takes a lock on the mempool mutex, so there is no concurrency
safety issues right now.
- Mempool errors will not halt consensus, instead they just log an error
and call mempool flush. I actually think this may be better behavior? If
we want to preserve the old behavior, we can thread a generic "consensus
halt error" channel perhaps?

I'm not sure how/where to best document this. Please also let me know if
tests need creating. Seems like the create empty block tests sometimes
hit failures, I'll investigate tmrw

Also please feel free to take over this PR, just thought I"d make it to
help us with performance improvements. Happy to get this into an
experimental release to test on mainnets.

---

- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request cometbft#3008 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Sergio Mena <sergio@informal.systems>
yihuang added a commit to crypto-org-chain/cometbft that referenced this pull request Oct 25, 2024
perf: Make mempool update async from block.Commit (backport cometbft#3008)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-v1.x Tell Mergify to backport the PR to v1.x bug Something isn't working mempool
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants
0