8000 Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) by sr-gi · Pull Request #29640 · bitcoin/bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

sr-gi
Copy link
Member
@sr-gi sr-gi commented Mar 12, 2024

This PR grabs some interesting bits from #29284 and fixes some edge cases in how block tiebreaks are dealt with.

Regarding #29284

The main functionality from the PR was dropped given it was not an issue anymore, however, reviewers pointed out some comments were outdated #29284 (comment) (which to my understanding may have led to thinking that there was still an issue) it also added test coverage for the aforementioned case which was already passing on master and is useful to keep.

New functionality

While reviewing the superseded PR, it was noticed that blocks that are loaded from disk may face a similar issue (check #29284 (comment) for more context).

The issue comes from how tiebreaks for equal work blocks are handled: if two blocks have the same amount of work, the one that is activatable first wins, that is, the one for which we have all its data (and all of its ancestors'). The variable that keeps track of this, within CBlockIndex is nSequenceId, which is not persisted over restarts. This means that when a node is restarted, all blocks loaded from disk are defaulted the same nSequenceId: 0.
Now, when trying to decide what chain is best on loading blocks from disk, the previous tiebreaker rule is not decisive anymore, so the CBlockIndexWorkComparator has to default to its last rule: whatever block is loaded first (has a smaller memory address).

This means that if multiple same work tip candidates were available before restarting the node, it could be the case that the selected chain tip after restarting does not match the one before.

Therefore, the way nSequenceId is initialized is changed to:

  • 0 for blocks that belong to the previously known best chain
  • 1 to all other blocks loaded from disk

@DrahtBot
Copy link
Contributor
DrahtBot commented Mar 12, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29640.

Reviews

See the guideline for information on the review process.

Type Reviewers
Stale ACK sipa, mzumsande, furszy

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • In feature_chain_tiebreaks.py: “Make sure than only the former connects” -> “Make sure that only the former connects” [‘than’ should be ‘that’]
  • In feature_chain_tiebreaks.py: “# Restart and check enough times to this to eventually fail if the logic is broken” -> “# Restart and check this enough times to eventually fail if the logic is broken” [‘to this’ is misplaced and hinders comprehension]

No other typos were found.

drahtbot_id_4_m

@sr-gi sr-gi changed the title Adds missing test to chain ties (CBlockIndexWorkComparator) Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator) Mar 14, 2024
@sr-gi sr-gi changed the title Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator) Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) Mar 14, 2024
@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from 43873be to 18820ac Compare March 14, 2024 20:52
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22675498262

@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch 3 times, most recently from a705e00 to 77e1c45 Compare March 15, 2024 21:09
@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch 2 times, most recently from 8225bd6 to f95e896 Compare March 22, 2024 15:23
@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from f95e896 to e4cf8d0 Compare March 25, 2024 07:44
@sr-gi
Copy link
Member Author
sr-gi commented Mar 25, 2024

Rebased to drop the custom log fix in favor of a more generic solution (#29640 (comment))

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/26038919395

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from 2cdc369 to c6ca2a1 Compare July 24, 2024 15:08
@sr-gi
Copy link
Member Author
sr-gi commented Jul 24, 2024

Rebased to deal with CI failing.

c76f4c6 has been amended to include __file__ in the ChainTiebreaksTest constructor, as required by any subclass of BitcoinTestFramework since #30463

Comment on lines +140 to +143
# Restart and check enough times to this to eventually fail if the logic is broken
for _ in range(10):
self.restart_node(0)
assert_equal(blocks[0].hash, node.getbestblockhash())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In c6ca2a1 "test: Adds block tiebreak over restarts tests"

This does not seem to fail on master, but it does on this branch with the fix commit reverted. Is it possible that this was fixed by a different PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for me, it also fails on master - but not always, intermittently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to how this failure is supposed to happen, which is not ideal. The obtained hash depends on the memory address where the competing hashes are loaded, so "enough" here is hard to measure.

Happy to use an alternative approach is you can think of any :/

Copy link
Member
@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Just started reviewing. It needs to be rebased on master to include CMake support.
Also, left a small nit along the way.

@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from c6ca2a1 to a9235a8 Compare January 28, 2025 16:47
@sr-gi
Copy link
Member Author
sr-gi commented Jan 28, 2025

Thanks @furszy. I've added two constants for the values loaded from disk and rebased over master.

@furszy
Copy link
Member
furszy commented Jan 28, 2025

I've added two constants for the values loaded from disk

Missed to push them?

@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from a9235a8 to c9645b4 Compare January 28, 2025 17:14
@sr-gi
Copy link
Member Author
sr-gi commented Jan 28, 2025

I've added two constants for the values loaded from disk

Missed to push them?

Indeed. Should be fixed now

@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from c9645b4 to 177d07f Compare February 11, 2025 16:19
@sr-gi
Copy link
Member Author
sr-gi commented Feb 11, 2025

The last commit had a small typo, force-pushing to fix it

Comment on lines 4723 to 4724
// Make sure our chain tip before shutting down scores better than any other candidate
// to maintain a consistent best tip over reboots
Copy link
Member
@furszy furszy Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit:

Suggested change
// Make sure our chain tip before shutting down scores better than any other candidate
// to maintain a consistent best tip over reboots
// Make sure our chain tip before shutting down scores better than any other candidate
// to maintain a consistent best tip over reboots in case of a tie with another chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in 3869469

Comment on lines 4725 to 4712
auto target = tip;
while (target) {
target->nSequenceId = 0;
target = target->pprev;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: cannot just set the tip's sequence id to 0?

We only compare the tip inside TryAddBlockIndexCandidate and not the entire chain (if not, then a test might be missing because setting only the tip to 0 still passes all tests).

Copy link
Member
@furszy furszy Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered offline by mzumsande. It's for an edge case: the user invalidates the tip after startup and before receiving any blocks from the network. The node then might find out that there are two competing blocks for the tip's ancestor. A test wouldn't hurt to document this behavior but it is not entirely needed.

Copy link
Contributor
@mzumsande mzumsande Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is an edge case we'd necessarily need to support though - it was just the only case I could come up with where it would matter.

Copy link
Member
@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 177d07f

@DrahtBot DrahtBot requested review from mzumsande and sipa February 12, 2025 16:31
Copy link
Contributor
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be ready for merge if @mzumsande and @sipa can re-ack

node.invalidateblock(blocks[9].hash)
node.invalidateblock(blocks[10].hash)
# B7 is now active.
assert_equal(node.getbestblockhash(), blocks[7].hash)
Copy link
Contributor
@mzumsande mzumsande Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assertion correct - why can't the best blockhash be B8?
Even though B7 is received first, both are put into std::multimap<CBlockIndex*, CBlockIndex*> m_blocks_unlinked; (with no comparator, so pointer address is used which is decided by the OS?!). and later, when their parent arrives, accessed via equal_range in ReceivedBlockTransactions where nSequenceId is then set in that order. Couldn't this be done in the reverse order B8 -> B7 if the OS gives out the pointer addresses differently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I am following here. How does ReceivedBlockTransactions come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nSequenceId is set here in ReceivedBlockTransactions once a block is connectable (itself and all predecessors have received transactions). In this case, nSequenceId is set for B7 and B8 when the full block of the parent B3 arrives. So when ReceivedBlockTransactions is called for B3, we iterate over m_blocks_unlinked here, add them to the queue and then process B7 and B8 to assign them a nSequenceId - but I think it may not be deterministic whether B7 or B8 is processed first and gets the lower nSequenceId - so when we later reorg to these blocks, it would also not be determnistic what FindMostWorkChain returns.

Copy link
Member Author
@sr-gi sr-gi Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely overlooked this, but it looks like the (unhinted) insertion order is preserved when retrieving via equal_range:

Since emplace and unhinted insert always insert at the upper bound, the order of equivalent elements in the equal range is the order of insertion unless hinted insert or emplace_hint was used to insert an element at a different position.

source: https://en.cppreference.com/w/cpp/container/multimap/equal_range

@DrahtBot DrahtBot requested a review from mzumsande April 22, 2025 20:37
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
Before this, if we had two (or more) same work tip candidates and restarted our node,
it could be the case that the block set as tip after bootstrap didn't match the one
before stopping. That's because the work and `nSequenceId` of both block will be the same
(the latter is only kept in memory), so the active chain after restart would have depended
on what tip candidate was loaded first.

This makes sure that we are consistent over reboots.

Github-Pull: bitcoin#29640
Rebased-From: c8f5e62
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
Make it easier to follow what the values come without having to go
over the comments, plus easier to maintain

Github-Pull: bitcoin#29640
Rebased-From: c7f9061
luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025
Adds tests to make sure we are consistent on activating the same chain over
a node restart if two or more candidates have the same work when the node is shutdown

Github-Pull: bitcoin#29640
Rebased-From: 177d07f
src/chain.h Outdated
@@ -35,6 +35,9 @@ static constexpr int64_t MAX_FUTURE_BLOCK_TIME = 2 * 60 * 60;
* MAX_FUTURE_BLOCK_TIME.
*/
static constexpr int64_t TIMESTAMP_WINDOW = MAX_FUTURE_BLOCK_TIME;
//! Init values for CBlockIndex nSequenceId when loaded from disk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The extra whitespace at the beginning should be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered in 3869469

assert_equal(node.getbestblockhash(), blocks[2].hash)

self.log.info('Send parents B3-B4 of B8-B10 in reverse order')
peer.send_blocks_and_test([blocks[4]], node, success=False, force_send=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be an easy doc fix, something like:

-         - if success is True: assert that the node's tip advances to the most recent block
-         - if success is False: assert that the node's tip doesn't advance
+         - if success is True: assert that the node's tip is the last block in blocks at the end of the operation.
+         - if success is False: assert that the node's tip isn't the last block in blocks at the end of the operation

Will submit separately if it does not get picked up here.

node.invalidateblock(blocks[9].hash)
node.invalidateblock(blocks[10].hash)
# B7 is now active.
assert_equal(node.getbestblockhash(), blocks[7].hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I am following here. How does ReceivedBlockTransactions come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain.

sr-gi and others added 6 commits June 20, 2025 10:59
Before this, if we had two (or more) same work tip candidates and restarted our node,
it could be the case that the block set as tip after bootstrap didn't match the one
before stopping. That's because the work and `nSequenceId` of both block will be the same
(the latter is only kept in memory), so the active chain after restart would have depended
on what tip candidate was loaded first.

This makes sure that we are consistent over reboots.
Make it easier to follow what the values come without having to go
over the comments, plus easier to maintain
Adds tests to make sure we are consistent on activating the same chain over
a node restart if two or more candidates have the same work when the node is shutdown
It's not true that if success=False the tip doesn't advance. It doesn'test
advance to the provided tip, but it can advance to a competing one
@sr-gi sr-gi force-pushed the 202403-block-tiebreak branch from 177d07f to 3cb689a Compare June 20, 2025 14:59
@sr-gi
Copy link
Member Author
sr-gi commented Jun 20, 2025

Thanks for reviewing @TheCharlatan and @furszy. I rebased the code and addressed the outstanding comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0