Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

sr-gi · 2024-03-12T20:11:04Z

This PR grabs some interesting bits from #29284 and fixes some edge cases in how block tiebreaks are dealt with.

Regarding #29284

The main functionality from the PR was dropped given it was not an issue anymore, however, reviewers pointed out some comments were outdated #29284 (comment) (which to my understanding may have led to thinking that there was still an issue) it also added test coverage for the aforementioned case which was already passing on master and is useful to keep.

New functionality

While reviewing the superseded PR, it was noticed that blocks that are loaded from disk may face a similar issue (check #29284 (comment) for more context).

The issue comes from how tiebreaks for equal work blocks are handled: if two blocks have the same amount of work, the one that is activatable first wins, that is, the one for which we have all its data (and all of its ancestors'). The variable that keeps track of this, within CBlockIndex is nSequenceId, which is not persisted over restarts. This means that when a node is restarted, all blocks loaded from disk are defaulted the same nSequenceId: 0.
Now, when trying to decide what chain is best on loading blocks from disk, the previous tiebreaker rule is not decisive anymore, so the CBlockIndexWorkComparator has to default to its last rule: whatever block is loaded first (has a smaller memory address).

This means that if multiple same work tip candidates were available before restarting the node, it could be the case that the selected chain tip after restarting does not match the one before.

Therefore, the way nSequenceId is initialized is changed to:

0 for blocks that belong to the previously known best chain
1 to all other blocks loaded from disk

DrahtBot · 2024-03-12T20:11:07Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29640.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Stale ACK	sipa, mzumsande, furszy

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

In feature_chain_tiebreaks.py: “Make sure than only the former connects” -> “Make sure that only the former connects” [‘than’ should be ‘that’]
In feature_chain_tiebreaks.py: “# Restart and check enough times to this to eventually fail if the logic is broken” -> “# Restart and check this enough times to eventually fail if the logic is broken” [‘to this’ is misplaced and hinders comprehension]

No other typos were found.

^{drahtbot_id_4_m}

src/validation.cpp

DrahtBot · 2024-03-14T20:52:19Z

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

_{Debug: https://github.com/bitcoin/bitcoin/runs/22675498262}

sr-gi · 2024-03-25T07:45:47Z

Rebased to drop the custom log fix in favor of a more generic solution (#29640 (comment))

src/node/blockstorage.cpp

src/validation.cpp

DrahtBot · 2024-07-23T20:51:49Z

🚧 At least one of the CI tasks failed.
_{Debug: https://github.com/bitcoin/bitcoin/runs/26038919395}

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

sr-gi · 2024-07-24T15:11:55Z

Rebased to deal with CI failing.

c76f4c6 has been amended to include __file__ in the ChainTiebreaksTest constructor, as required by any subclass of BitcoinTestFramework since #30463

achow101 · 2024-12-04T20:12:30Z

test/functional/feature_chain_tiebreaks.py

+        # Restart and check enough times to this to eventually fail if the logic is broken
+        for _ in range(10):
+            self.restart_node(0)
+            assert_equal(blocks[0].hash, node.getbestblockhash())


In c6ca2a1 "test: Adds block tiebreak over restarts tests"

This does not seem to fail on master, but it does on this branch with the fix commit reverted. Is it possible that this was fixed by a different PR?

for me, it also fails on master - but not always, intermittently.

This is due to how this failure is supposed to happen, which is not ideal. The obtained hash depends on the memory address where the competing hashes are loaded, so "enough" here is hard to measure.

Happy to use an alternative approach is you can think of any :/

furszy

Concept ACK.

Just started reviewing. It needs to be rebased on master to include CMake support.
Also, left a small nit along the way.

src/chain.h

sr-gi · 2025-01-28T16:52:34Z

Thanks @furszy. I've added two constants for the values loaded from disk and rebased over master.

furszy · 2025-01-28T17:10:59Z

I've added two constants for the values loaded from disk

Missed to push them?

sr-gi · 2025-01-28T17:14:22Z

I've added two constants for the values loaded from disk

Missed to push them?

Indeed. Should be fixed now

sr-gi · 2025-02-11T16:20:04Z

The last commit had a small typo, force-pushing to fix it

furszy · 2025-02-12T15:01:13Z

src/validation.cpp

+    // Make sure our chain tip before shutting down scores better than any other candidate
+    // to maintain a consistent best tip over reboots


tiny nit:

Suggested change

// Make sure our chain tip before shutting down scores better than any other candidate

// to maintain a consistent best tip over reboots

// Make sure our chain tip before shutting down scores better than any other candidate

// to maintain a consistent best tip over reboots in case of a tie with another chain.

Covered in 3869469

furszy · 2025-02-12T15:48:43Z

src/validation.cpp

+    auto target = tip;
+    while (target) {
+        target->nSequenceId = 0;
+        target = target->pprev;
+    }


q: cannot just set the tip's sequence id to 0?

We only compare the tip inside TryAddBlockIndexCandidate and not the entire chain (if not, then a test might be missing because setting only the tip to 0 still passes all tests).

Answered offline by mzumsande. It's for an edge case: the user invalidates the tip after startup and before receiving any blocks from the network. The node then might find out that there are two competing blocks for the tip's ancestor. A test wouldn't hurt to document this behavior but it is not entirely needed.

I don't think that this is an edge case we'd necessarily need to support though - it was just the only case I could come up with where it would matter.

furszy

Code review ACK 177d07f

ryanofsky

Might be ready for merge if @mzumsande and @sipa can re-ack

mzumsande · 2025-04-22T20:34:27Z

test/functional/feature_chain_tiebreaks.py

+        node.invalidateblock(blocks[9].hash)
+        node.invalidateblock(blocks[10].hash)
+        # B7 is now active.
+        assert_equal(node.getbestblockhash(), blocks[7].hash)


Is this assertion correct - why can't the best blockhash be B8?
Even though B7 is received first, both are put into std::multimap<CBlockIndex*, CBlockIndex*> m_blocks_unlinked; (with no comparator, so pointer address is used which is decided by the OS?!). and later, when their parent arrives, accessed via equal_range in ReceivedBlockTransactions where nSequenceId is then set in that order. Couldn't this be done in the reverse order B8 -> B7 if the OS gives out the pointer addresses differently?

I'm not sure I am following here. How does ReceivedBlockTransactions come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain.

nSequenceId is set here in ReceivedBlockTransactions once a block is connectable (itself and all predecessors have received transactions). In this case, nSequenceId is set for B7 and B8 when the full block of the parent B3 arrives. So when ReceivedBlockTransactions is called for B3, we iterate over m_blocks_unlinked here, add them to the queue and then process B7 and B8 to assign them a nSequenceId - but I think it may not be deterministic whether B7 or B8 is processed first and gets the lower nSequenceId - so when we later reorg to these blocks, it would also not be determnistic what FindMostWorkChain returns.

I completely overlooked this, but it looks like the (unhinted) insertion order is preserved when retrieving via equal_range:

Since emplace and unhinted insert always insert at the upper bound, the order of equivalent elements in the equal range is the order of insertion unless hinted insert or emplace_hint was used to insert an element at a different position.

source: https://en.cppreference.com/w/cpp/container/multimap/equal_range

Github-Pull: bitcoin#29640 Rebased-From: a06e5a8

Github-Pull: bitcoin#29640 Rebased-From: 05fb4f8

Before this, if we had two (or more) same work tip candidates and restarted our node, it could be the case that the block set as tip after bootstrap didn't match the one before stopping. That's because the work and `nSequenceId` of both block will be the same (the latter is only kept in memory), so the active chain after restart would have depended on what tip candidate was loaded first. This makes sure that we are consistent over reboots. Github-Pull: bitcoin#29640 Rebased-From: c8f5e62

Make it easier to follow what the values come without having to go over the comments, plus easier to maintain Github-Pull: bitcoin#29640 Rebased-From: c7f9061

Adds tests to make sure we are consistent on activating the same chain over a node restart if two or more candidates have the same work when the node is shutdown Github-Pull: bitcoin#29640 Rebased-From: 177d07f

TheCharlatan · 2025-06-17T11:16:08Z

src/chain.h

@@ -35,6 +35,9 @@ static constexpr int64_t MAX_FUTURE_BLOCK_TIME = 2 * 60 * 60;
 * MAX_FUTURE_BLOCK_TIME.
 */
 static constexpr int64_t TIMESTAMP_WINDOW = MAX_FUTURE_BLOCK_TIME;
+ //! Init values for CBlockIndex nSequenceId when loaded from disk


Nit: The extra whitespace at the beginning should be removed.

Covered in 3869469

TheCharlatan · 2025-06-17T12:11:02Z

test/functional/feature_chain_tiebreaks.py

+        assert_equal(node.getbestblockhash(), blocks[2].hash)
+
+        self.log.info('Send parents B3-B4 of B8-B10 in reverse order')
+        peer.send_blocks_and_test([blocks[4]], node, success=False, force_send=True)


It would be an easy doc fix, something like:

- - if success is True: assert that the node's tip advances to the most recent block - - if success is False: assert that the node's tip doesn't advance + - if success is True: assert that the node's tip is the last block in blocks at the end of the operation. + - if success is False: assert that the node's tip isn't the last block in blocks at the end of the operation

Will submit separately if it does not get picked up here.

TheCharlatan · 2025-06-17T12:43:40Z

test/functional/feature_chain_tiebreaks.py

+        node.invalidateblock(blocks[9].hash)
+        node.invalidateblock(blocks[10].hash)
+        # B7 is now active.
+        assert_equal(node.getbestblockhash(), blocks[7].hash)


I'm not sure I am following here. How does ReceivedBlockTransactions come into play here? As far as I can tell this activates the best chain, which should then correctly pick B7 again through FindMostWorkChain.

Before this, if we had two (or more) same work tip candidates and restarted our node, it could be the case that the block set as tip after bootstrap didn't match the one before stopping. That's because the work and `nSequenceId` of both block will be the same (the latter is only kept in memory), so the active chain after restart would have depended on what tip candidate was loaded first. This makes sure that we are consistent over reboots.

Make it easier to follow what the values come without having to go over the comments, plus easier to maintain

Adds tests to make sure we are consistent on activating the same chain over a node restart if two or more candidates have the same work when the node is shutdown

It's not true that if success=False the tip doesn't advance. It doesn'test advance to the provided tip, but it can advance to a competing one

sr-gi · 2025-06-20T15:09:14Z

Thanks for reviewing @TheCharlatan and @furszy. I rebased the code and addressed the outstanding comments.

maflcko mentioned this pull request Mar 12, 2024

Choose earliest-activatable as tie breaker between equal-work chains #29284

Closed

DrahtBot mentioned this pull request Mar 13, 2024

scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] #29641

Draft

maflcko reviewed 8000 Mar 13, 2024

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request Mar 13, 2024

Avoid divide-by-zero in header sync logs when NodeClock is behind #29647

Merged

sr-gi changed the title ~~Adds missing test to chain ties (CBlockIndexWorkComparator)~~ Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator) Mar 14, 2024

sr-gi changed the title ~~Fixes tiebreak when loading blocks from disk and adds missing test to chain ties (CBlockIndexWorkComparator)~~ Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) Mar 14, 2024

sr-gi force-pushed the 202403-block-tiebreak branch from 43873be to 18820ac Compare March 14, 2024 20:52

DrahtBot added the CI failed label Mar 14, 2024

sr-gi force-pushed the 202403-block-tiebreak branch 3 times, most recently from a705e00 to 77e1c45 Compare March 15, 2024 21:09

DrahtBot removed the CI failed label Mar 15, 2024

sr-gi force-pushed the 202403-block-tiebreak branch 2 times, most recently from 8225bd6 to f95e896 Compare March 22, 2024 15:23

DrahtBot added the Needs rebase label Mar 22, 2024

sr-gi force-pushed the 202403-block-tiebreak branch from f95e896 to e4cf8d0 Compare March 25, 2024 07:44

DrahtBot removed the Needs rebase label Mar 25, 2024

luke-jr reviewed Jun 7, 2024

View reviewed changes

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

mzumsande reviewed Jun 8, 2024

View reviewed changes

src/validation.cpp Outdated Show resolved Hide resolved

src/validation.cpp Outdated Show resolved Hide resolved

src/validation.cpp Outdated Show resolved Hide resolved

sr-gi force-pushed the 202403-block-tiebreak branch from e4cf8d0 to 2cdc369 Compare June 10, 2024 18:15

This was referenced Jun 25, 2024

RFC: Instanced logs #30338

Closed

kernel, logging: Pass Logger instances to kernel objects #30342

Draft

DrahtBot added the CI failed label Jul 23, 2024

sr-gi force-pushed the 202403-block-tiebreak branch from 2cdc369 to c6ca2a1 Compare July 24, 2024 15:08

DrahtBot removed the CI failed label Jul 25, 2024

achow101 reviewed Dec 4, 2024

View reviewed changes

furszy reviewed Jan 27, 2025

View reviewed changes

src/chain.h Outdated Show resolved Hide resolved

sr-gi force-pushed the 202403-block-tiebreak branch from c6ca2a1 to a9235a8 Compare January 28, 2025 16:47

sr-gi force-pushed the 202403-block-tiebreak branch from a9235a8 to c9645b4 Compare January 28, 2025 17:14

sr-gi force-pushed the 202403-block-tiebreak branch from c9645b4 to 177d07f Compare February 11, 2025 16:19

furszy reviewed Feb 12, 2025

View reviewed changes

DrahtBot requested review from mzumsande and sipa February 12, 2025 16:31

ryanofsky reviewed Mar 23, 2025

View reviewed changes

mzumsande reviewed Apr 22, 2025

View reviewed changes

DrahtBot requested a review from mzumsande April 22, 2025 20:37

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025

Updates CBlockIndexWorkComparator outdated comment

018a9b8

Github-Pull: bitcoin#29640 Rebased-From: a06e5a8

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Jun 6, 2025

test: add functional test for complex reorgs

12753fd

Github-Pull: bitcoin#29640 Rebased-From: 05fb4f8

TheCharlatan reviewed Jun 17, 2025

View reviewed changes

sr-gi and others added 6 commits June 20, 2025 10:59

Updates CBlockIndexWorkComparator outdated comment

bf85a42

test: add functional test for complex reorgs

4fc7105

Make nSequenceId init value constants

5b22ab6

Make it easier to follow what the values come without having to go over the comments, plus easier to maintain

test: Adds block tiebreak over restarts tests

85314ee

Adds tests to make sure we are consistent on activating the same chain over a node restart if two or more candidates have the same work when the node is shutdown

test: Fixes send_blocks_and_test docs

3cb689a

It's not true that if success=False the tip doesn't advance. It doesn'test advance to the provided tip, but it can advance to a competing one

sr-gi force-pushed the 202403-block-tiebreak branch from 177d07f to 3cb689a Compare June 20, 2025 14:59

		// Make sure our chain tip before shutting down scores better than any other candidate
		// to maintain a consistent best tip over reboots

Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

Are you sure you want to change the base?

Fix tiebreak when loading blocks from disk (and add tests for comparing chain ties) #29640

Conversation

Uh oh!

Regarding #29284

New functionality

Uh oh!

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

LLM Linter (✨ experimental)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!