8000 assumeutxo: snapshot initialization by jamesob · Pull Request #25667 · bitcoin/bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

assumeutxo: snapshot initialization #25667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 13, 2022
Merged

Conversation

jamesob
Copy link
Contributor
@jamesob jamesob commented Jul 21, 2022

This is part of the assumeutxo project (parent PR: #15606)


Half of the replacement for #24232. The original PR grew larger than expected throughout the review process.

This change adds the ability to initialize a snapshot-based chainstate during init if one is detected on disk. This is of course unused as of now (aside from in unittests) given that we haven't yet enabled actually loading snapshots.

Don't be scared! There are some big move-only commits in here.

Accompanying changes include:

  • moving the snapshot coinsdb directory from being called chainstate_[base blockhash] to chainstate_snapshot, since we only support one snapshot in use at a time. This simplifies some logic, but it necessitates writing that base blockhash out to a file within the coinsdb dir. See discussion here.
  • adding a simple fix in FlushBlockFile() that avoids a crash when attemping to flush to disk before LoadBlockIndexDB() is called, which happens when calling MaybeRebalanceCaches() during multiple chainstate init.
  • improving the unittest to allow testing with on-disk chainstates - necessary to test a simulated restart and re-initialization.

@jamesob jamesob force-pushed the 2022-07-au.init-2 branch from 38d511b to 0a482e9 Compare July 21, 2022 17:27
@DrahtBot
Copy link
Contributor
DrahtBot commented Jul 21, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26283 (p2p: Fill reconciliation sets and request reconciliation (Erlay) by naumenkogs)
  • #25977 (refactor: Replace std::optional<bilingual_str> with util::Result by ryanofsky)
  • #25862 (refactor, kernel: Remove gArgs accesses from dbwrapper and txdb by ryanofsky)
  • #25704 (refactor: Remove almost all validation option globals by MarcoFalke)
  • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)
  • #23443 (p2p: Erlay support signaling by naumenkogs)
  • #22663 (dbwrapper: properly destroy objects in case CDBWrapper ctor throws by Crypt-iQ)
  • #19888 (rpc, test: Improve getblockstats for unspendables by fjahr)
  • #17786 (refactor: Nuke policy/fees->mempool circular dependencies by hebasto)
  • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 15da925. All commits here except the last two were directly cherry-picked from previously reviewed #24232.

Copy link
@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 15da925, my comments on #24232 have been addressed.

@@ -4720,6 +4708,46 @@ const AssumeutxoData* ExpectedAssumeutxo(
return nullptr;
}

static bool DeleteCoinsDBFromDisk(const fs::path db_path, bool is_snapshot)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: for now is_snapshot is always true as there is a single usage. This is not the case anymore with the remaining commits of the #24232 patchset.

//! snapshot that is in the process of being validated.
bool DetectSnapshotChainstate(CTxMemPool* mempool) EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

void ResetChainstates() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to reviewers: this method is used in the remaining #24232 patchset.

@ariard
Copy link
ariard commented Aug 1, 2022

@dergoegge I think you could be interested to review the assumeutxo work. In case of future hypothetical integration of libutreexo in Core, I believe there would be intersecting code paths and data structure modified. It might also prepare for future synergies between utreexo and assumeutxo.

@dongcarl
Copy link
Contributor
dongcarl commented Aug 2, 2022

Planning on diving deeply into this, thanks for splitting it up @jamesob!

Questions from going over the commits:

Q1: Is the intended layout:

  • <datadir>
    • chainstate
      • *.ldb
      • *.log
      • CURRENT
      • LOCK
      • MANIFEST-*
    • chainstate_snapshot
      • base_blockhash
      • *.ldb
      • *.log
      • CURRENT
      • LOCK
      • MANIFEST-*

Q2: The last 6 commits seem to be focused on unit tests, but it seems to me that the functionality being added here is better tested with functional tests. Is there something we need to test here that a functional test won't be able to do?

Comment on lines +4720 to +4795
bool removed = fs::remove(base_blockhash_path);
if (!removed) {
LogPrintf("[snapshot] failed to remove file %s\n",
fs::PathToString(base_blockhash_path));
}
} else {
LogPrintf("[snapshot] snapshot chainstate dir being removed lacks %s file\n",
fs::PathToString(node::SNAPSHOT_BLOCKHASH_FILENAME));
}
}

std::string path_str = fs::PathToString(db_path);
LogPrintf("Removing leveldb dir at %s\n", path_str);

// We have to destruct before this call leveldb::DB in order to release the db
// lock, otherwise `DestroyDB` will fail. See `leveldb::~DBImpl()`.
const bool destroyed = dbwrapper::DestroyDB(path_str, {}).ok();
Copy link
Contributor
@dongcarl dongcarl Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to other reviewers: the call to DestroyDB will a 10000 ctually remove the entire directory, including the blockhash file if it still existed since we place the blockhash file inside what leveldb considers to be its directory, but it's probably better (more polite?) to remove it ourselves just so we can print nice errors.

env->DeleteDir(dbname); // Ignore error in case dir contains other files

@ariard
Copy link
ariard commented Aug 23, 2022

This PR needs rebase.

@jamesob
Copy link
Contributor Author
jamesob commented Sep 6, 2022

Hi all, sorry for the delay. My spirit has been weak the last few months, and I took two weeks off to try and shake the burnout.

I've addressed all feedback listed here and rebased.

@dongcarl asks:

Q1: Is the intended layout:

Yep, it is.

Q2: The last 6 commits seem to be focused on unit tests, but it seems to me that the functionality being added here is better tested with functional tests. Is there something we need to test here that a functional test won't be able to do?

We can't have functional tests to exercise this behavior until we add loadtxoutset to the RPC interface, which will effectively reveal assumeutxo to end users before all the necessary changes have been made to make it usable. Plus, I think the unittests allow us to more easily express granular testcases and assertions.

This CreateAndActivateUTXOSnapshot parameter is necessary once we
perform snapshot completion within ABC, since the existing UpdateTip
test will fail because the IBD chain that has generated the snapshot
will exceed the base of the snapshot.

Being able to test snapshots being loaded into a mostly-uninitialized
datadir allows for more realistic unittest scenarios.
Used when testing cleanup of on-disk chainstate data for snapshot
testcases. Also necessary for simulating node restart in .cpp tests.
in TestingSetup(). This is used in the following commit to test
reinitializing chainstates after snapshot validation and cleanup.

Best reviewed with `git diff --color-moved=dimmed-zebra`.
@jamesob
Copy link
Contributor Author
jamesob commented Sep 13, 2022

Thanks for the feedback @MarcoFalke, all addressed.

@naumenkogs
Copy link
Member

utACK bf95976

Copy link
@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK bf95976

Changes since last ACK:

  • CChainState -> Chainstate, after CChainState -> Chainstate #24513
  • replacing CAutoFile usage in utxo_snapshot.cpp by AutoFile
  • commenting SNAPSHOT_BLOCKHASH_FILENAME
  • a new informative log L116 in src/node/chainstate.cpp
  • replacing InitializeChainstate by ActivateExistingSnapshot in tests
  • dropping the optional snapshot_blockash argument from InitializeChainstate and adding asserts
  • commenting ActivateExistingSnapshot
  • comment, lock simplification and minor code changes in unit tests

Copy link
Contributor
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK bf95976. Changes since last review: rebasing, switching from CAutoFile to AutoFile, adding comments, switching from BOOST_CHECK to Assert in test util, using chainman.GetMutex() in tests, destroying one ChainstateManager before creating a new one in tests

Copy link
Contributor
@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK bf95976

@@ -9,6 +9,7 @@
#include <coins.h>
#include <dbwrapper.h>
#include <sync.h>
#include <fs.h>
Copy link
Contributor
@fjahr fjahr Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can be swapped with sync.h above to have alphabetical order

@@ -76,8 +76,9 @@ original chainstate remains in use as active.

Once the snapshot chainstate is loaded and validated, it is promoted to active
chainstate and a sync to tip begins. A new chainstate directory is created in the
datadir for the snapshot chainstate called
`chainstate_[SHA256 blockhash of snapshot base block]`.
datadir for the snapshot chainstate called `chainstate_snapshot`. When this directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 117 in this file still mentions chainstate_[hash]

@jamesob
Copy link
Contributor Author
jamesob commented Oct 4, 2022

I'm counting 4 ACKs here - what more is needed to merge?

Copy link
Contributor
@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I also verified that all commits compile and pass unit tests.
Left some minor/nit comments.

//! @returns filesystem path to the on-disk data.
std::optional<fs::path> StoragePath() {
if (m_is_memory) {
return {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d14bebf
also missing #include <optional>

Suggested change
return {};
return std::nullopt;

@@ -524,6 +524,16 @@ void BlockManager::FlushUndoFile(int block_file, bool finalize)
void BlockManager::FlushBlockFile(bool fFinalize, bool finalize_undo)
{
LOCK(cs_LastBlockFile);

if (m_blockfile_info.size() < 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8153bd9
nit: easier to read and guaranteed constant time

Suggested change
if (m_blockfile_info.size() < 1) {
if (m_blockfile_info.empty()) {

return true;
}

std::optional<uint256> ReadSnapshotBaseBlockhash(fs::path chaindir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f9f1735
nit

Suggested change
std::optional<uint256> ReadSnapshotBaseBlockhash(fs::path chaindir)
std::optional<uint256> ReadSnapshotBaseBlockhash(const fs::path& chaindir)

const std::vector<const char*>& extra_args,
const bool coins_db_in_memory,
const bool block_tree_db_in_memory)
: TestingSetup{CBaseChainParams::REGTEST, extra_args, coins_db_in_memory, block_tree_db_in_memory}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

51fc924
I believe it should be like this?

Suggested change
: TestingSetup{CBaseChainParams::REGTEST, extra_args, coins_db_in_memory, block_tree_db_in_memory}
: TestingSetup{chain_name, extra_args, coins_db_in_memory, block_tree_db_in_memory}


// Background chainstate should be unaware of new blocks on the snapshot
// chainstate.
for (Chainstate* cs : chainman_restarted.GetAll()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e4d7995
nit

Suggested change
for (Chainstate* cs : chainman_restarted.GetAll()) {
for (const Chainstate* cs : chainman_restarted.GetAll()) {

@@ -4780,6 +4767,46 @@ const AssumeutxoData* ExpectedAssumeutxo(
return nullptr;
}

static bool DeleteCoinsDBFromDisk(const fs::path db_path, bool is_snapshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

34d1590
nit

Suggested change
static bool DeleteCoinsDBFromDisk(const fs::path db_path, bool is_snapshot)
static bool DeleteCoinsDBFromDisk(const fs::path& db_path, bool is_snapshot)

@jamesob
Copy link
Contributor Author
jamesob commented Oct 7, 2022

I'm going to leave the nits as-is to avoid ACK churn unless maints feel they are blocking merge. I'll be very quick to review any followups that @aureleoules or @fjahr would like to do towards addressing them.

Copy link
Contributor
@aureleoules aureleoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK bf95976

@achow101 achow101 merged commit 6912a28 into bitcoin:master Oct 13, 2022
@jamesob jamesob mentioned this pull request Oct 13, 2022
18 tasks
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 23, 2022
if (fs::exists(base_blockhash_path)) {
bool removed = fs::remove(base_blockhash_path);
if (!removed) {
LogPrintf("[snapshot] failed to remove file %s\n",
Copy link
Contributor
@andrewtoth andrewtoth Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this code is the intended behavior. This is redundant to the previous check 3 lines up. If fs::remove without a second parameter returns false it is only because the file does not exist. If it actually fails to remove an existing file it will throw an exception (see https://en.cppreference.com/w/cpp/filesystem/remove).

If intended behavior is to only log error and continue fs::remove should be wrapped in a try/catch block or passed a std::error_code.

See https://github.com/bitcoin/bitcoin/blob/master/src/init.cpp#L326-L332.

Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
> db: add StoragePath to CDBWrapper/CCoinsViewDB
>
> This is used in subsequent commits. It allows us to clean up UTXO
> snapshot chainstate after background validation completes.
bitcoin/bitcoin@d14bebf

> validation: rename snapshot chainstate dir
>
> This changes the snapshot's leveldb chainstate dir name from
> `chainstate_[blockhash]` to `chainstate_snapshot`. This simplifies
> later logic that loads snapshot data, and enforces the limitation
> of a single snapshot at any given time.
>
> Since we still need to persist the blockhash of the base block, we
> write that out to a file (`chainstate_snapshot/base_blockhash`) for
> later use during initialization, so that we can reinitialize the
> snapshot chainstate.
>
> Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
bitcoin/bitcoin@f9f1735

> test: allow on-disk coins and block tree dbs in tests
>
> Used when testing cleanup of on-disk chainstate data for snapshot
> testcases. Also necessary for simulating node restart in .cpp tests.
bitcoin/bitcoin@51fc924

Note: TestChain100Setup now inherits TestingSetup instead of RegTestingSetup. This would normally be done in [[bitcoin/bitcoin#22818 | core#22818]] which we didn't backport because the whole softfork machiney is not relevant to us.

This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14650
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
… dbs in tests

Summary:
> init: add utxo snapshot detection
>
> Add functionality for activating a snapshot-based chainstate if one is
> detected on-disk.
>
> Also cautiously initialize chainstate cache usages so that we don't
> somehow blow past our cache allowances during initialization, then
> rebalance at the end of init.
>
> Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
bitcoin/bitcoin@252abd1

> blockmanager: avoid undefined behavior during FlushBlockFile
>
> If we call FlushBlockFile() without having intitialized the block index
> with LoadBlockIndexDB(), we may be indexing into an empty vector.
>
> Specifically this is an issue when we call MaybeRebalanceCaches() during
> chainstate init before the block index has been loaded, which calls
> FlushBlockFile().
>
> Also add an assert to avoid undefined behavior.
>
> Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
bitcoin/bitcoin@8153bd9

This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]
Depends on D14650

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14651
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
For use in next commit.

Most easily reviewed with
`--color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change`.

This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]
bitcoin/bitcoin@3a29dfb

Depends on D14651

Test Plan: `ninja check`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14652
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
> validation: add ResetChainstates()
>
> Necessary for the following test commit.
bitcoin/bitcoin@00b357c

> test: add reset_chainstate parameter for snapshot unittests
>
> This CreateAndActivateUTXOSnapshot parameter is necessary once we
> perform snapshot completion within ABC, since the existing UpdateTip
> test will fail because the IBD chain that has generated the snapshot
> will exceed the base of the snapshot.
>
> Being able to test snapshots being loaded into a mostly-uninitialized
> datadir allows for more realistic unittest scenarios.
bitcoin/bitcoin@3c36139

Backport note: this process of resetting the chainstate without also resetting the block index would cause `CheckBlockIndex` to fail on `assert(setBlockIndexCandidates.count(pindex)));` after D4717 made `CheckBlockIndex` run on intermediate steps in `ActiveBestChain`. I disabled `CheckBlockIndex` until after the snapshot chain is activated.

This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]
Depends on D14652

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14653
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
> add utilities for deleting on-disk leveldb data
>
> Used in later commits to remove leveldb directories for
> - invalid snapshot chainstates, and
> - background-vaildation chainstates that have finished serving their
>   purpose.
bitcoin/bitcoin@34d1590

> validation: remove snapshot datadirs upon validation failure
>
> If a UTXO snapshot fails to validate, don't leave the resulting datadir
> on disk as this will confuse initialization on next startup and we'll
> get an assertion error.
bitcoin/bitcoin@ad67ff3

This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]
Depends on D14653

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14654
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
This is a partial backport of [[bitcoin/bitcoin#25667 | core#25667]]
bitcoin/bitcoin@e4d7995

Depends on D14654

Test Plan: `ninja check`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14655
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
This is a backport of [[bitcoin/bitcoin#23154 | core#23154]]
with updates from [[bitcoin/bitcoin#25667 | core#25667]] and [[bitcoin/bitcoin#25740 | core#25740]]

Depends on D14655

Test Plan: proof-reading

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14656
@bitcoin bitcoin locked and limited conversation to collaborators Dec 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0