-
Notifications
You must be signed in to change notification settings - Fork 37.4k
Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29640. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
No other typos were found. drahtbot_id_4_m |
43873be
to
18820ac
Compare
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
a705e00
to
77e1c45
Compare
8225bd6
to
f95e896
Compare
f95e896
to
e4cf8d0
Compare
Rebased to drop the custom log fix in favor of a more generic solution (#29640 (comment)) |
e4cf8d0
to
2cdc369
Compare
🚧 At least one of the CI tasks failed. HintsMake sure to run all tests locally, according to the documentation. The failure may happen due to a number of reasons, for example:
Leave a comment here, if you need help tracking down a confusing failure. |
2cdc369
to
c6ca2a1
Compare
# Restart and check enough times to this to eventually fail if the logic is broken | ||
for _ in range(10): | ||
self.restart_node(0) | ||
assert_equal(blocks[0].hash, node.getbestblockhash()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In c6ca2a1 "test: Adds block tiebreak over restarts tests"
This does not seem to fail on master, but it does on this branch with the fix commit reverted. Is it possible that this was fixed by a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me, it also fails on master - but not always, intermittently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to how this failure is supposed to happen, which is not ideal. The obtained hash depends on the memory address where the competing hashes are loaded, so "enough" here is hard to measure.
Happy to use an alternative approach is you can think of any :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Just started reviewing. It needs to be rebased on master to include CMake support.
Also, left a small nit along the way.
c6ca2a1
to
a9235a8
Compare
Thanks @furszy. I've added two constants for the values loaded from disk and rebased over master. |
Missed to push them? |
a9235a8
to
c9645b4
Compare
Indeed. Should be fixed now |
c9645b4
to
177d07f
Compare
The last commit had a small typo, force-pushing to fix it |
src/validation.cpp
Outdated
// Make sure our chain tip before shutting down scores better than any other candidate | ||
// to maintain a consistent best tip over reboots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
// Make sure our chain tip before shutting down scores better than any other candidate | |
// to maintain a consistent best tip over reboots | |
// Make sure our chain tip before shutting down scores better than any other candidate | |
// to maintain a consistent best tip over reboots in case of a tie with another chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered in 3869469
src/validation.cpp
Outdated
auto target = tip; | ||
while (target) { | ||
target->nSequenceId = 0; | ||
target = target->pprev; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: cannot just set the tip's sequence id to 0?
We only compare the tip inside TryAddBlockIndexCandidate
and not the entire chain (if not, then a test might be missing because setting only the tip to 0 still passes all tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered offline by mzumsande. It's for an edge case: the user invalidates the tip after startup and before receiving any blocks from the network. The node then might find out that there are two competing blocks for the tip's ancestor. A test wouldn't hurt to document this behavior but it is not entirely needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this is an edge case we'd necessarily need to support though - it was just the only case I could come up with where it would matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 177d07f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be ready for merge if @mzumsande and @sipa can re-ack
node.invalidateblock(blocks[9].hash) | ||
node.invalidateblock(blocks[10].hash) | ||
# B7 is now active. | ||
assert_equal(node.getbestblockhash(), blocks[7].hash) |
There was a problem hiding this c 10000 omment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assertion correct - why can't the best blockhash be B8?
Even though B7 is received first, both are put into std::multimap<CBlockIndex*, CBlockIndex*> m_blocks_unlinked;
(with no comparator, so pointer address is used which is decided by the OS?!). and later, when their parent arrives, accessed via equal_range
in ReceivedBlockTransactions
where nSequenceId
is then set in that order. Couldn't this be done in the reverse order B8 -> B7 if the OS gives out the pointer addresses differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I am following here. How does ReceivedBlockTransactions
come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nSequenceId
is set here in ReceivedBlockTransactions
once a block is connectable (itself and all predecessors have received transactions). In this case, nSequenceId
is set for B7 and B8 when the full block of the parent B3 arrives. So when ReceivedBlockTransactions
is called for B3, we iterate over m_blocks_unlinked
here, add them to the queue and then process B7 and B8 to assign them a nSequenceId
- but I think it may not be deterministic whether B7 or B8 is processed first and gets the lower nSequenceId
- so when we later reorg to these blocks, it would also not be determnistic what FindMostWorkChain
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely overlooked this, but it looks like the (unhinted) insertion order is preserved when retrieving via equal_range
:
Since emplace and unhinted insert always insert at the upper bound, the order of equivalent elements in the equal range is the order of insertion unless hinted insert or emplace_hint was used to insert an element at a different position.
source: https://en.cppreference.com/w/cpp/container/multimap/equal_range
Github-Pull: bitcoin#29640 Rebased-From: a06e5a8
Github-Pull: bitcoin#29640 Rebased-From: 05fb4f8
Before this, if we had two (or more) same work tip candidates and restarted our node, it could be the case that the block set as tip after bootstrap didn't match the one before stopping. That's because the work and `nSequenceId` of both block will be the same (the latter is only kept in memory), so the active chain after restart would have depended on what tip candidate was loaded first. This makes sure that we are consistent over reboots. Github-Pull: bitcoin#29640 Rebased-From: c8f5e62
Make it easier to follow what the values come without having to go over the comments, plus easier to maintain Github-Pull: bitcoin#29640 Rebased-From: c7f9061
Adds tests to make sure we are consistent on activating the same chain over a node restart if two or more candidates have the same work when the node is shutdown Github-Pull: bitcoin#29640 Rebased-From: 177d07f
src/chain.h
Outdated
@@ -35,6 +35,9 @@ static constexpr int64_t MAX_FUTURE_BLOCK_TIME = 2 * 60 * 60; | |||
* MAX_FUTURE_BLOCK_TIME. | |||
*/ | |||
static constexpr int64_t TIMESTAMP_WINDOW = MAX_FUTURE_BLOCK_TIME; | |||
//! Init values for CBlockIndex nSequenceId when loaded from disk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The extra whitespace at the beginning should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covered in 3869469
assert_equal(node.getbestblockhash(), blocks[2].hash) | ||
|
||
self.log.info('Send parents B3-B4 of B8-B10 in reverse order') | ||
peer.send_blocks_and_test([blocks[4]], node, success=False, force_send=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be an easy doc fix, something like:
- - if success is True: assert that the node's tip advances to the most recent block
- - if success is False: assert that the node's tip doesn't advance
+ - if success is True: assert that the node's tip is the last block in blocks at the end of the operation.
+ - if success is False: assert that the node's tip isn't the last block in blocks at the end of the operation
Will submit separately if it does not get picked up here.
node.invalidateblock(blocks[9].hash) | ||
node.invalidateblock(blocks[10].hash) | ||
# B7 is now active. | ||
assert_equal(node.getbestblockhash(), blocks[7].hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I am following here. How does ReceivedBlockTransactions
come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain
.
Before this, if we had two (or more) same work tip candidates and restarted our node, it could be the case that the block set as tip after bootstrap didn't match the one before stopping. That's because the work and `nSequenceId` of both block will be the same (the latter is only kept in memory), so the active chain after restart would have depended on what tip candidate was loaded first. This makes sure that we are consistent over reboots.
Make it easier to follow what the values come without having to go over the comments, plus easier to maintain
Adds tests to make sure we are consistent on activating the same chain over a node restart if two or more candidates have the same work when the node is shutdown
It's not true that if success=False the tip doesn't advance. It doesn'test advance to the provided tip, but it can advance to a competing one
177d07f
to
3cb689a
Compare
Thanks for reviewing @TheCharlatan and @furszy. I rebased the code and addressed the outstanding comments. |
This PR grabs some interesting bits from #29284 and fixes some edge cases in how block tiebreaks are dealt with.
Regarding #29284
The main functionality from the PR was dropped given it was not an issue anymore, however, reviewers pointed out some comments were outdated #29284 (comment) (which to my understanding may have led to thinking that there was still an issue) it also added test coverage for the aforementioned case which was already passing on master and is useful to keep.
New functionality
While reviewing the superseded PR, it was noticed that blocks that are loaded from disk may face a similar issue (check #29284 (comment) for more context).
The issue comes from how tiebreaks for equal work blocks are handled: if two blocks have the same amount of work, the one that is activatable first wins, that is, the one for which we have all its data (and all of its ancestors'). The variable that keeps track of this, within
CBlockIndex
isnSequenceId
, which is not persisted over restarts. This means that when a node is restarted, all blocks loaded from disk are defaulted the samenSequenceId
: 0.Now, when trying to decide what chain is best on loading blocks from disk, the previous tiebreaker rule is not decisive anymore, so the
CBlockIndexWorkComparator
has to default to its last rule: whatever block is loaded first (has a smaller memory address).This means that if multiple same work tip candidates were available before restarting the node, it could be the case that the selected chain tip after restarting does not match the one before.
Therefore, the way
nSequenceId
is initialized is changed to: