8000 wallet: avoid mixing different `OutputTypes` during coin selection by josibake · Pull Request #24584 · bitcoin/bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

wallet: avoid mixing different OutputTypes during coin selection #24584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 28, 2022

Conversation

josibake
Copy link
Member
@josibake josibake commented Mar 16, 2022

Concept

Following #23789, Bitcoin Core wallet will now generate a change address that matches the payment address type. This improves privacy by not revealing which of the outputs is the change at the time of the transaction in scenarios where the input address types differ from the payment address type. However, information about the change can be leaked in a later transaction. This proposal attempts to address that concern.

Leaking information in a later transaction

Consider the following scenario:

mix input types(1)

  1. Alice has a wallet with bech32 type UTXOs and pays Bob, who gives her a P2SH address
  2. Alice's wallet generates a P2SH change output, preserving her privacy in txid: a
  3. Alice then pays Carol, who gives her a bech32 address
  4. Alice's wallet combines the P2SH UTXO with a bech32 UTXO and txid: b has two bech32 outputs

From a chain analysis perspective, it is reasonable to infer that the P2SH input in txid: b was the change from txid: a. To avoid leaking information in this scenario, Alice's wallet should avoid picking the P2SH output and instead fund the transaction with only bech32 Outputs. If the payment to Carol can be funded with just the P2SH output, it should be preferred over the bech32 outputs as this will convert the P2SH UTXO to bech32 UTXOs via the payment and change outputs of the new transaction.

TLDR; Avoid mixing output types, spend non-default OutputTypes when it is economical to do so.

Approach

AvailableCoins now populates a struct, which makes it easier to access coins by OutputType. Coin selection tries to find a funding solution by each output type and chooses the most economical by waste metric. If a solution can't be found without mixing, coin selection runs over the entire wallet, allowing mixing, which is the same as the current behavior.

I've also added a functional test (test/functional/wallet_avoid_mixing_output_types.py) and unit test (src/wallet/test/availablecoins_tests.cpp.

Copy link
Member
@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an argument against this change is that any privacy leaking tx could very well be a coin-join between two wallets that use different output types. Though, I don't think any such coin-join software exists today, so until then it probably makes sense to use inputs of a single type. #24362 (review) mentions that legacy inputs could preferably be used during low-fee periods to reduce the fees paid by the user. Though, if #24362 is merged, then it would probably not be worth it to optimize for fees.

@josibake josibake force-pushed the josibake-coin-selection-v2 branch from 312d934 to f900010 Compare March 16, 2022 13:48
@josibake
Copy link
Member Author

I think an argument against this change is that any privacy leaking tx could very well be a coin-join between two wallets that use different output types. Though, I don't think any such coin-join software exists today, so until then it probably makes sense to use inputs of a single type.

I agree, until mixed inputs in transactions are more common, this change should improve user privacy.

#24362 (review) mentions that legacy inputs could preferably be used during low-fee periods to reduce the fees paid by the user. Though, if #24362 is merged, then it would probably not be worth it to optimize for fees.

I prefer this approach over #24362 in order to keep the privacy benefits of matching change output addresses for all address types. I also see this as a small first step, where we can optimize for fees in a later PR if it is deemed useful/worth it.

@josibake josibake force-pushed the josibake-coin-selection-v2 branch from f900010 to fdf1fa5 Compare March 16, 2022 14:19
@murchandamus
Copy link
Contributor

Very exciting, I'm glad you're working on this. I was just discussing this idea with people yesterday.

A couple comments on the approach described in the OP:

  • I noticed that you suggest preferring the spending of older types and compare it to spending newer types preferentially. While I do agree that the old types are the ones we'd want to get rid off rather, I am not sure this would lead to the optimal outcome. I would suggest running coin selection for each of the OutputGroup sets in parallel and then choosing the input set per the waste metric. Due to how the waste metric works, I'd expect this to lead to the older larger output types getting preferentially spent at lower feerates (and in the current feerate environment, usually first). What are your thoughts on that?
  • Regarding splitting the OutputGroups: since each OutputGroup will only have one type, you could just get the available set once and subdivide it from there to run the coin selection in parallel. Would that perhaps be easier?
  • CoinSelection is pretty fast, with the exception of Knapsack, so running it on subsets of the UTXO pool shouldn't be a huge issue, running it on all subsets in parallel may be even faster than running it on the whole UTXO pool once since the complexity of each subset would be much smaller.

Ceterum censeo Knapsackinem esse delendam.

@josibake
Copy link
Member Author
josibake commented Mar 16, 2022

force pushed to fix clang-formatting and add comments per @MarcoFalke 's suggestion. can be verified with git range-diff master f900010 fdf1fa5

Comment on lines 196 to 197
} else if (type == TxoutType::SCRIPTHASH) {
output_type = OutputType::P2SH_SEGWIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In b32b52c "wallet: avoid mixing ouput types in coin selection"

This is not guaranteed to be true. We could have a P2SH multisig and that would be LEGACY.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a cleaner way to go map a COutput to an OutputType? adding lots of if/else checks to go from a TxoutType to an OutputType seems clunky to me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to map to output type? Why not use TxoutType directly?

Copy link
Member
@achow101 achow101 Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to map to output type? Why not use TxoutType directly?

Yes, P2SH_SEGWIT is distinct from other P2SH which would be legacy.

is there a cleaner way to go map a COutput to an OutputType? adding lots of if/else checks to go from a TxoutType to an OutputType seems clunky to me

I don't think there exists one yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean multisig P2SH should not be mixed with legacy either? Mapping to OutputType can't fix this. In fact it will probably break anyway when the wallet can spend segwit v2 outputs.

Copy link
Member
@achow101 achow101 Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unlikely that multisig things would actually be mixed with single key things, but I think it would be incorrect to group P2SH with p2sh- 8000 segwit. Conceivably, the same multisig could have been p2sh, then upgraded to p2sh-segwit, and then to bech32. In that case, it would still be legacy, p2sh-segwit, and bech32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not guaranteed to be true. We could have a P2SH multisig and that would be LEGACY.

so if a P2SH_SEWGIT and a P2SH multisig were spent together, when the UTXO is later spent it would reveal that some inputs were legacy multisig and others p2sh-segwit?

As an alternative, would it be better to map WITNESS_V0_SCRIPTHASH to P2SH and map SCRIPTHASH and PUBKEYHASH to LEGACY?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time we get determining the output type, we already know that the output either belongs to the wallet, or is watch only. Thus we can do additional solver invocations for scripthash with the redeemScript in order to figure out the true type.

For watch only outputs, we may not be able to do that because not all of the scripts will necessarily be available. However we could still use solving data provided in coinControl to try to determine that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing me in the right direction! I'll take a stab at implementing this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now also considering the reedem script when a UTXO is P2SH

src/outputtype.h Outdated
@@ -22,6 +22,7 @@ enum class OutputType {
BECH32M,
};

// Keep output types sorted by age, with older types first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention the places that require it (CreateTransactionInternal), so that when those go away, the comment can go away, too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, now that it is not dependent on ordering

Comment on lines 196 to 197
} else if (type == TxoutType::SCRIPTHASH) {
output_type = OutputType::P2SH_SEGWIT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to map to output type? Why not use TxoutType directly?

@josibake
Copy link
Member Author

I would suggest running coin selection for each of the OutputGroup sets in parallel and then choosing the input set per the waste metric. Due to how the waste metric works, I'd expect this to lead to the older larger output types getting preferentially spent at lower feerates (and in the current feerate environment, usually first). What are your thoughts on that?

This makes a lot of sense. By running coin selection on each OutputGroup set you are preserving privacy by avoiding mixing the inputs, but also still choosing the best spend w.r.t to fees. Running them in parallel (as opposed to sequentially) also helps speed things up (which you also mentioned).

I'll work on refactoring to implement this.

Comment on lines 189 to 200
OutputType output_type{wallet.m_default_address_type};
std::vector<std::vector<uint8_t>> dummy;
TxoutType type{Solver(wtx.tx->vout[i].scriptPubKey, dummy)};
if (type == TxoutType::WITNESS_V1_TAPROOT) {
output_type = OutputType::BECH32M;
} else if (type == TxoutType::WITNESS_V0_KEYHASH) {
output_type = OutputType::BECH32;
} else if (type == TxoutType::SCRIPTHASH) {
output_type = OutputType::P2SH_SEGWIT;
} else if (type == TxoutType::PUBKEYHASH) {
output_type = OutputType::LEGACY;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps amend availableCoins to just return the outputs grouped by type already. E.g. availableCoins could be a struct with a vector for each an output type, and a member all that returns the four vectors concatenated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@DrahtBot
Copy link
Contributor
DrahtBot commented Mar 17, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25685 (wallet: Faster transaction creation by removing pre-set-inputs fetching responsibility from Coin Selection by furszy)
  • #25659 (wallet: simplify ListCoins implementation by furszy)
  • #25647 (wallet: return change from SelectionResult by S3RK)
  • #24814 (refactor: improve complexity of removing preselected coins by rag-hav)
  • #24699 (wallet: Improve AvailableCoins performance by reducing duplicated operations by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

// First, do coin selection on a filtered subset of coins, filtered by OutputType
// Attempt with older output types first (LEGACY > P2SH > BECH32 > BECH32M)
std::optional<SelectionResult> result = [&] {
// This takes advantange of the fact that OUTPUT_TYPES is already sorted by age (LEGACY, P2SH, BECH32, BECH32M)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes there is a strict preference for newer UTXO types. Rather, it should try to prefer using UTXOs that are not the user's preferred address type(s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the amended approach is now that it would pick the input set with the best waste metric score from all SelectionResults of running coin selection on the separate output types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, forgot to respond to this! yes, rather than rely on a forced ordering of old to new, I'm implementing @xekyo 's suggestion of running coin selection on each output group and then picking the solution with the best waste metric. If no solution is found, then run coin selection over all coins (current behavior).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated it now to not force an ordering on OutputType.

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Mar 24, 2022
…face_zmq`

bc90b8d [move only] remove `is_wallet_compiled` checks (josibake)
0bfbf7f test: use MiniWallet in `interfaces_zmq` (josibake)

Pull request description:

  While working on bitcoin#24584 , `interface_zmq` started failing due to coin selection not running deterministically. The test doesn't actually need the wallet, so this PR migrates it to use MiniWallet

  _Note for reviewers:_ the second commit moves large chunks of code out of an if block, so it may be helpful to review with something that ignores whitespace, e.g `git diff -w master`

ACKs for top commit:
  vincenzopalazzo:
    ACK bitcoin@bc90b8d

Tree-SHA512: c618e23d00635d72dafdef28e68cbc88b9cc2030d4898fc5b7eac926fd621684c1958c075ed167192716b18308da5a0c1f1393396e31b99d0d3bde78b78fefc5
@josibake josibake force-pushed the josibake-coin-selection-v2 branch from fdf1fa5 to 072e755 Compare March 28, 2022 12:16
@josibake josibake changed the title [RFC] wallet: avoid mixing different OutputTypes during coin selection wallet: avoid mixing different OutputTypes during coin selection Mar 28, 2022
Copy link
Contributor
@murchandamus murchandamus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I like how you've split up availableCoins and how you have structured your tests. I have some ideas on some additional tests you might want to add, and a few questions.

txC = alice.sendtoaddress(alice.getnewaddress(), Decimal("10"))
txA = self.nodes[1].sendtoaddress(alice.getnewaddress(), Decimal("10"))
txB = self.nodes[1].sendtoaddress(alice.getnewaddress(), Decimal("10"))
txC = self.nodes[1].sendtoaddress(alice.getnewaddress(), Decimal("10"))
self.sync_mempools()
self.generate(self.nodes[1], 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: I first thought that the self.generate(self.nodes[1], 1) here would of course confirm all of the above transactions, but this obviously only happens after all three transactions get built, which means that unconfirmed change UTXOs could have been used in construction of the latter.

Note to author: This may point to an underlying problem. Since the UTXO pools divided by type are much smaller, the backoff steps allowing for unconfirmed change to be used may come into play much more often. Then, since the waste metric does not yet account for the unconfirmed inputs, we may spend unconfirmed UTXOs more often. This in turn may cause a problem especially if the follow-up transaction is aiming for a higher feerate than the transaction that created the unconfirmed input, at least while #15553 remains unsolved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, is it by design that the waste metric does not consider unconfirmed change? a few ideas to address this:

  1. have the waste metric also consider unconfirmed change
  2. consider things like unconfirmed inputs when comparing solutions

for example, if solution A is to spend all confirmed inputs with a waste metric of X and solution B spends unconfirmed change with a waste metric of Y, prefer solution A even if X > Y.

Copy link
Contributor
@murchandamus murchandamus Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, Bitcoin Core ignoring the feerate of parent transactions when using unconfirmed inputs is a bug, but one that turned out more difficult to fix than anticipated. I have started a branch (https://github.com/Xekyo/bitcoin/tree/ancestor-aware-funding) which I intend to push forward in collaboration with @glozow. We were stuck by not being able to get information on unconfirmed parent transactions in the wallet yet, and this should now be resolved by a node:wallet interface that provides mining scores which @glozow came up with.

Then, the intended goal is that we'd automatically add sufficient fees to elevate the mining score for both the new and the ancestor transactions to the target feerate and account for these additional fees in the waste metric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better handled now by running by OutputType at each stage of back-off

@@ -293,7 +293,7 @@ def test_getbalances_used(self):

# send multiple transactions, reusing one address
for _ in range(101):
self.nodes[0].sendtoaddress(new_addr, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this change is being made. There don't seem to be any checks or asserts referring to the UTXO pool of nodes[0], and wouldn't all the coinbase outputs and change outputs be of the same type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is related to your earlier comment (#24584 (comment)) regarding unconfirmed UTXOs being spent more often. In this test, after the first tx is sent, the next 100 txs build a chain where the unconfirmed change is being spent, eventually hitting the mempool chain limit of 25 and causing an error. By explicitly setting fee_rate=1, nodes[0] starts spending its confirmed coinbase UTXOs instead.

This also demonstrates your earlier point that this change is causing unconfirmed outputs to be spent more often

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-worked this to run coin selection by OutputType at each back-off, which fixed the preference for spending unconfirmed UTXOs. as such, I was able to leave the tests as is and removed the test changes from the PR.


std::vector<COutput> AvailableCoins::all() {
std::vector<COutput> all;
all.reserve(Bech32m.size() + Bech32.size() + P2SH.size() + Legacy.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this reserve(…) not include Other.size()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, it should include Other.size()

Comment on lines 215 to 216
std::vector<std::vector<uint8_t>> dummy;
TxoutType type{Solver(wtx.tx->vout[i].scriptPubKey, dummy)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this return_value_unused.

It's not obvious what this dummy is and why it's needed. I take it that it's a return value for the script solver to put the actual solution components in, and we don't care about it here because we only want to learn the type. Perhaps you could add a comment here to elucidate that for future reviewers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -816,9 +816,40 @@ static bool CreateTransactionInternal(
// Get available coins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message in "wallet: run coin selection by OutputType" 49d802f is missing a word in the first sentence of the body.

result5->ComputeAndSetWaste(coin_selection_params.m_cost_of_change);
return result5;
}
return std::optional<SelectionResult>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return std::nullopt; if there is no solution?

Comment on lines 122 to 125
self.log.debug(
"Make the same payment again, with a higher feerate. "
"Check that Alice's wallet prefers bech32 UTXOs."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether there is a misunderstanding here. It's not that certain types are strictly preferred at the corresponding feerate categories, but that larger input scripts will simply affect the waste score more due to their size in either direction due to their larger transaction weight. This in turn leads to a higher waste score for the same count of legacy inputs at high feerates vs more efficient types, and a more negative waste score at low feerates. Obviously, this would cause one specific type to always win if you always have the same count of inputs among the options, but wouldn't necessarily hold true if the input sets could have varying input counts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are on the same page, it's just my wording in the comments isn't correct. Initially, I wrote the comments for when I was strictly preferring older types to newer types. I'll update these comments to be more accurate.

def skip_test_if_missing_module(self):
self.skip_if_no_wallet()

def test_case_one(self, A, B):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the starting situation of the nodes. Are they both empty? Is Alice or Bob the default node that has the previous block rewards? Do they both have funds? Are the funds all of one type, or mixed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, both nodes are starting with a clean chain (setup_clean_chain = True), and then node A mines 100 blocks in run_test.

Much better would be to explain this at the beginning of the test so it's clear what the respective UTXO sets for each node are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 17 to 25
CASE 1: Alice mixed UTXOs (bech32, P2SH). She makes multiple payments
to Bob, where bech32 UTXOs are preferred at higher fee rates
and P2SH are preffered at lower fee rates.
CASE 2: Alice has mixed UTXOs (P2SH, bech32, legacy) and makes multiple
payments to Bob's bech32 address, where legacy is preferred at
low fee rates, P2SH at medium fee rates, and bech32 at high rates.
CASE 3: Alice has mixed UTXOs (legacy, P2SH, bech32) and makes multiple
payments to Bob, where legacy and P2SH are mixed at lower fee rates
and finally all UTXOs are mixed for the final payment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels that these tests are more focused on the cost efficiency of the input sets than validating that input sets of only one type are produced.

• Could you perhaps provide a bit more information on the initial setup of the wallets' UTXO pools?
• From what I gather, most of these tests end up using a single input to fund their transactions. A single input transaction will always have the same type for all inputs. 😁 Given that I understood that right, it would be more convincing if the wallets had to select from a number of UTXOs of various values and types, and needed to pick multiple UTXOs to create the input set, while you then showed that we consistently pick input sets with multiple inputs of the same type. I also feel that the inputs being of the same type would be the more important aspect of this PR rather than which type of inputs gets chosen exactly, although showing that the solution with the lower waste was picked would certainly be an appreciated secondary test scenario.
• The way these tests are implemented right now, they may be a bit too tightly coupled to the exact behavior of the coin selection algorithms we currently use, but I am hoping to change the composition of coin selection algorithms in the near term. This would be another reason why I'd prefer if the test were implemented more generically "all input have the same type" vs "exactly this type of inputs gets chosen". :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points. I'll refactor the test to:

  • Document the initial setup for each node
  • Start with a larger, mixed UTXO set
  • Be less brittle w.r.t to coin selection

Looking at the test again, I think I am trying to test for some things that would be better to test for in a unit test. Ideally, this test should create large UTXO sets for each node, spend back and forth at different fee rates and to different address types. The assertions should check that all inputs are of the same OutputType and finally test a scenario where the wallet can fund from any single OutputType and is forced to mix OutputTypes to fund the transaction

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be addressed now. I re-worked the test to make it less dependent on specific UTXOs and transactions and added a more detailed explanation.

Copy link
Member
@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Since the UTXO pools divided by type are much smaller, the backoff steps allowing for unconfirmed change to be used may come into play much more often. Then, since the waste metric does not yet account for the unconfirmed inputs, we may spend unconfirmed UTXOs more often.

is it by design that the waste metric does not consider unconfirmed change?

Our preference for different coin selection solutions is spread out over multiple places in the code, including the waste metric and the sequence of CoinEligibilityFilters used in each AttemptSelection() call. The way we encode our preference for confirmed UTXOs is by attempting coin selection with confirmed UTXOs first, and only trying with unconfirmed UTXOs if that fails.

Approach-wise, did you consider adding output type to CoinEligibilityFilter instead, and trying selection for each output type within AttemptSelection()? This way you try to do selection for every output type for confirmed UTXOs only, before you try unconfirmed.

Another possibility is adding the logic for preferring confirmed over unconfirmed UTXOs to SelectionResult and editing the operator< to account for it.

@josibake
Copy link
Member Author
josibake commented Mar 30, 2022

thanks for the review, @glozow !

Approach-wise, did you consider adding output type to CoinEligibilityFilter instead, and trying selection for each output type within AttemptSelection()? This way you try to do selection for every output type for confirmed UTXOs only, before you try unconfirmed.

This is how I wrote the code initially (by adding OutputType to CoinEligibilityFilter) but abandoned it in favor of separating the OutputTypes into different vectors during GetAvailableCoins and then running coin selection on each vector separately and comparing the results.

Another possibility is adding the logic for preferring confirmed over unconfirmed UTXOs to SelectionResult and editing the operator< to account for it.

I like this idea. I think it keeps the logic a little cleaner.

@josibake
Copy link
Member Author

based on feedback so far (thanks, everyone!), I think next steps are:

  • Better break out P2SH, vs lumping all P2SH with P2SH_SEGWIT
  • Account for unconfirmed txs being preferred when picking the "best" solution
  • Refactor functional tests to be less tightly coupled to coin selection and better documented
  • Potentially move some tests out of functional into unit tests

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 2, 2022
…face_zmq`

bc90b8d [move only] remove `is_wallet_compiled` checks (josibake)
0bfbf7f test: use MiniWallet in `interfaces_zmq` (josibake)

Pull request description:

  While working on bitcoin#24584 , `interface_zmq` started failing due to coin selection not running deterministically. The test doesn't actually need the wallet, so this PR migrates it to use MiniWallet

  _Note for reviewers:_ the second commit moves large chunks of code out of an if block, so it may be helpful to review with something that ignores whitespace, e.g `git diff -w master`

ACKs for top commit:
  vincenzopalazzo:
    ACK bitcoin@bc90b8d

Tree-SHA512: c618e23d00635d72dafdef28e68cbc88b9cc2030d4898fc5b7eac926fd621684c1958c075ed167192716b18308da5a0c1f1393396e31b99d0d3bde78b78fefc5
@kristapsk
Copy link
Contributor

Concept ACK

@achow101
Copy link
Member

re-ACK 71d1d13

Copy link
@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach NACK

I was not able to achieve the privacy mentioned in PR description. Steps to reproduce:

  1. Create a wallet w1, send 0.001 four times from signet faucet to bech32 addresses, wait for confirmation and spend more than 0.002 to a legacy address.
  2. There will be 2 UTXO in wallet: 1 legacy and other bech32
  3. Send more than 0.001 to a bech32 address

Example:

https://mempool.space/signet/tx/6bb60f7dd4171d915b615030bc3cd2af785188834f38f391aa6b28e6a5359ecb:0

https://mempool.space/signet/tx/2cd2821390126c5407f29be8669c612a4092d0bc11224811a0752ff7fa31d43a


This is a valid argument and there can be multiple interpretations even without coinjoin: #24584 (review)

Also still curious about the two questions/solutions I had about solving this problem in #24584 (comment)

@achow101
Copy link
Member

Approach NACK

I was not able to achieve the privacy mentioned in PR description. Steps to reproduce:

1. Create a wallet w1, send 0.001 four times from signet faucet to bech32 addresses, wait for confirmation and spend more than 0.002 to a legacy address.

2. There will be 2 UTXO in wallet: 1 legacy and other bech32

3. Send more than 0.001 to a bech32 address

Example:

https://mempool.space/signet/tx/6bb60f7dd4171d915b615030bc3cd2af785188834f38f391aa6b28e6a5359ecb:0

https://mempool.space/signet/tx/2cd2821390126c5407f29be8669c612a4092d0bc11224811a0752ff7fa31d43a

I don't see how this example tests what is described in the OP? You ended up making a transaction that has mixed inputs, but the that's because those are the only funds in your wallet. The point of this PR is to avoid mixing input types, not never mix input types.

@josibake
Copy link
Member Author

hey @1440000bytes , thanks for testing! as mentioned, the goal is not to never mix; the goal is to avoid mixing if possible. I've updated the OP to explicitly state this. also, take a look at the functional test, as this demonstrates the desired behavior: creating a wallet with mixed UTXO types, funding several transactions without mixing, and finally emptying the wallet (which is impossible to do without mixing).

This is a valid argument and there can be multiple interpretations even without coinjoin: #24584 (review)

while there can be other interpretations, the most likely interpretation today is that the non-default input in the subsequent transaction is the change from the previous transaction (and in the majority of cases, this interpretation is correct).

Also still curious about the two questions/solutions I had about solving this problem in #24584 (comment)

i responded here: #24584 (comment) , please let me know which parts of your original question you feel are unaddressed.

@ghost
Copy link
ghost commented Jul 26, 2022

How about insufficient funds error message or use mixed output types for change? This could happen even when some transactions are unconfirmed and confirmed UTXOs are of different types

@murchandamus
Copy link
Contributor

How about insufficient funds error message or use mixed output types for change? This could happen even when some transactions are unconfirmed and confirmed UTXOs are of different types

An advanced user that cares so much about not mixing inputs is probably already using coin control to manually select inputs. I think this PR is a reasonable middle-path of rolling out a privacy improvement while minimizing negative impact on the UX. Adding cases in which transaction building fails while there are sufficient funds available as the default behavior would be an unreasonable UX disimprovement. I would consider the addition of options for more strict behavior to be out of scope for this PR.

Copy link
Contributor
@murchandamus murchandamus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 71d1d13 via git range-diff master 6530d19 71d1d13

BOOST_CHECK_EQUAL(available_coins.bech32.size(), 2U);

// P2SH-SEGWIT
dest = wallet->GetNewDestination(OutputType::P2SH_SEGWIT, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
dest = wallet->GetNewDestination(OutputType::P2SH_SEGWIT, "");
dest = wallet->GetNewDestination(OutputType::P2SH_SEGWIT, "");
BOOST_ASSERT(dest.HasRes());

Copy link
Contributor
@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 71d1d13
Very nice! (one small suggestion is optional)

@@ -79,6 +79,32 @@ TxSize CalculateMaximumSignedTxSize(const CTransaction &tx, const CWallet *walle
return CalculateMaximumSignedTxSize(tx, wallet, txouts, coin_control);
}

uint64_t CoinsResult::size() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit 2e67291 refactor: store by OutputType in CoinsResult
nit

Suggested change
uint64_t CoinsResult::size() const
size_t CoinsResult::size() const

@achow101 achow101 merged commit 1abbae6 into bitcoin:master Jul 28, 2022
achow101 added a commit that referenced this pull request Aug 17, 2022
8cd21bb refactor: improve readability for AttemptSelection (josibake)
f47ff71 test: only run test for descriptor wallets (josibake)
0760ce0 test: add missing BOOST_ASSERT (josibake)
db09aec wallet: switch to new shuffle, erase, push_back (josibake)
b6b50b0 scripted-diff: Uppercase function names (josibake)
3f27a2a refactor: add new helper methods (josibake)
f5649db refactor: add UNKNOWN OutputType (josibake)

Pull request description:

  This PR is to address follow-ups for #24584, specifically:

  * Remove redundant, hard-to-read code by adding a new `OutputType` and adding shuffle, erase, and push_back methods for `CoinsResult`
  * Add missing `BOOST_ASSERT` to unit test
  * Ensure functional test only runs if using descriptor wallets
  * Improve readability of `AttemptSelection` by removing triple-nested if statement

  Note for reviewers: commit `refactor: add new helper methods` should throw an "unused function warning"; the function is used in the next commit. Also, commit `wallet: switch to new shuffle, erase, push_back` will fail to compile, but this is fixed in the next commit with a scripted-diff. the commits are separate like this (code change then scripted-diff) to improve legibility.

ACKs for top commit:
  achow101:
    ACK 8cd21bb
  aureleoules:
    ACK 8cd21bb.
  LarryRuane:
    Concept, code review ACK 8cd21bb
  furszy:
    utACK 8cd21bb. Left a small, non-blocking, comment.

Tree-SHA512: a1bbc5962833e3df4f01a4895d8bd748cc4c608c3f296fd94e8afd8797b8d2e94e7bd44d598bd76fa5c9f5536864f396fcd097348fa0bb190a49a86b0917d60e
bitcoinhodler added a commit to bitcoinhodler/GlacierProtocol that referenced this pull request Oct 27, 2022
bitcoin/bitcoin#24584 changes coin selection,
which changes some of the txids for our regtest transactions.

I ran:

    t/online_regtest_wallet.py recreate-all-tests

All of these tests are failing now because the golden no longer
matches, which I will update next, but I first wanted to commit the
automated step here separately.
bitcoinhodler added a commit to bitcoinhodler/GlacierProtocol that referenced this pull request Oct 27, 2022
Created by running `make remake`.

All tests passing again on bitcoin built from
1abbae65eb9c99df5d8941008068d83ad99bf117 (where bitcoin/bitcoin#24584
was merged).
@bitcoin bitcoin locked and limited conversation to collaborators Aug 15, 2023
@josibake josibake deleted the josibake-coin-selection-v2 branch January 26, 2024 10:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0