8000 WIP: Use new bech32 crate rewrite by tcharding · Pull Request #1809 · rust-bitcoin/rust-bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

WIP: Use new bech32 crate rewrite #1809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

tcharding
Copy link
Member
@tcharding tcharding commented Apr 23, 2023

Demonstrate usage of the proposed new rust-bech32 API.

@@ -32,7 +32,8 @@ use core::fmt;
use core::marker::PhantomData;
use core::str::FromStr;

use bech32;
use bech32::primitives::iter::{ByteIterExt, Fe32IterExt};
use bech32::{witness_version, Bech32, Bech32m, WitnessVersion};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it odd that WitnessVersion lives in bech32 now. Can you explain the decision more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't my decision, but I would guess that @tcharding figured that since we have to support witness versions at least in the form of arbitrary bech32 characters (since the bech32 spec itself treats witness versions specially, decoding the witness version character as an 8-bit byte rather than a 5-bit thing), we might as well give it a dedicated type so we could enforce that only the bech32-valid values are used.

I am of two minds on this ... on the one hand it decreases the generality of the library a bit by undermining other checksums that use a wider notion of "witness version" ... but on the other hand, AFAICT only bech32 and bech32m have a notion of "witness version" at all, and the other checksums used in Bitcoin just decode bech32 strings as strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that any checksum that did want to have their own non-bip173-like version of witness versions would be able to do this by creating another iterator adaptor, and then adding some new high-level funcitons to wrap it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems to me that the bips are pretty tightly coupled with the concept of witness version and also the same arguments apply to Network so I thought we should have types for them.

Note that any checksum that did want to have their own non-bip173-like version of witness versions would be able to do this by creating another iterator adaptor, and then adding some new high-level funcitons to wrap it.

Unless I'm missing something I think users wanting to opt out of WitnessVersion can use Parsed::new and the fist byte can have any meaning they want, right?

I am of two minds on this ... on the one hand it decreases the generality of the library a bit by undermining other checksums that use a wider notion of "witness version" ... but on the other hand, AFAICT only bech32 and bech32m have a notion of "witness version" at all, and the other checksums used in Bitcoin just decode bech32 strings as strings.

So this is my main concern as well, that is why I added a couple of new methods in primitives that allow bypassing the Network type (using a string HRP). I was steering this towards making the crate root API functions super bip-173/350 specific and really easy to use for the typical Bitcoin usecase. All other niche usecases have to go to primitives.

We hadn't really talked about the Network type @apoelstra, to discuss that was one reason for this draft.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we don't have to remove the WitnessVersion from rust-bitcoin (I believe the abstraction is exactly the same so that's why I removed it. The Network abstraction however is not the same in rus-bitcoin as in bech32, if I'm not confused, as far as bip173/350 are concerned there is only mainnet, testnet, and regtest i.e., there is no separation between signet and testnet for segwit addresses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I know what's the correct rule: types that don't depend on anything else from bitcoin* except internals. They should operate only on Rust types. So if bitcoin is analogous to std then bitcoin-primitives is analogous to core (calling it core would be obviously confusing).

I like this. Let's try to use this rule of thumb and see if we can make it work.

primitives is fine :). core is not :P.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitcoin-primitives is taken on crates.io already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-bitcoin-primitives - primitives types used by the 'rust-bitcoin' ecosystem?

We could take rust-bitcoin-internals while we are at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not use a rust-bitcoin prefix. It feels a bit too redundant to have rust in the name of a Rust crate.

Other suggestions are bitcoin-basics, bitocin-data, bitcoin-essential.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd almost like bitcoin-entities if Clean Architecture was more known - because that's what they are named there.

-primitive-types is free but kinda long. -common is free, short and understandable I think.

I tried ask ChatGPT and it didn't come up with anything else I like.

I also realized the rule of thumb has a flaw: it'd exclude anything that can be parsed and has FromStr because of the associated Err type (and making it too lax would just import whole bitcoin crate). So I think it should make exception for when the type implements any std traits that have associated types then the associated type would be in the crate as well and doesn't count as dependency. Further, the rule of thumb could also include some silly types that are used in a specific submodule of bitcoin so I think the rule also needs to require that the type is shared among multiple crates.

Further, I think that maybe we're doing this a bit wrong. bech32 should be probably project-agnostic - so no dependency on bitcoin and whatever functionality we need should be added inside the address module (crate). Since presumably if people are parsing addresses they are parsing Address and if they are parsing something else they don't need that stuff. Note that this still implies wanting to have a shared crate that will be depended on by address crate, network crate and others.

@tcharding
Copy link
Member Author

I find it odd that WitnessVersion lives in bech32 now. Can you explain the decision more?

I totally turned your thread into a discussion about Network, sorry about that :)

@tcharding
Copy link
Member Author

I don't understand the comment about Network being tightly tied to the BIPs.

What I mean is that as far as segwit addresses are concerned there is a one-to-one mapping from HRP -> network (if we define network to be one of mainnet, testnet, regtest).

It's fine to add conversion functions between HRPs and rust-bitcoin Networks for use in address computation, but both BIP 173 and BIP 350 have tons of test vectors that use arbitrary HRPs.

From BIP-173: "We first describe the general checksummed base32[1] format called Bech32 and then define Segregated Witness addresses using it."

So, my current thought is that the crate root module should implement the segwit address format and the primitives module should support the "genereal checksummed base32 format called bech32".

That is what I've tried to do with the "main" API functions (all taking Network and following the rules based on witness version).

This idea has made me unsure about the name bech32::decode, should it include "segwit" in the identifier to make it obvious what we are doing?

Note that I moved some bip test vector tests into unit tests while playing with the error (based on that blogpost) but now we are going with having Eq and friends for error types I'll move them back. If all the test vectors are implemented as integration tests (ie, using the public API) then we know we are good to go, right?

In any case it seems fine for rust-bitcoin to have conversion methods to get a Network from a HRP and vice-versa.

I don't think that rust-bitcoin should have anything to do with the address string for segwit addresses other than passing them to rust-bech32, said another way I don't think that rust-bitcoin should ever consider HRPs at all.

@apoelstra
Copy link
Member

I don't think there's any value in taking a Network rather than an HRP string. It makes the library less flexible and increases the API surface, and if there's really a "one-to-one mapping" (which I dispute because it precludes users using their own HRPs for custom chains, or the lnbc prefix for BOLT11 payments) then there's no loss in expressivity.

This idea has made me unsure about the name bech32::decode, should it include "segwit" in the identifier to make it obvious what we are doing?

If it parses the witness version separately it should have some indication that it's doing so. If it's also doing length checks then yes, I think it should have segwit in the name.

@apoelstra
Copy link
Member

I don't think that rust-bitcoin should have anything to do with the address string for segwit addresses other than passing them to rust-bech32, said another way I don't think that rust-bitcoin should ever consider HRPs at all.

I strongly disagree. Only rust-bitcoin knows what HRPs are bitcoin-specific, and if this library is supposed to maintain a list of all the HRPs in use everywhere that's a pretty big scope increase. (And if it's supposed to maintain only the three in BIP-173 that have named networks associated with them, excluding the others used in unit tests and the large set tof HRPs which it describes as "valid", that would be confusing to users and would require them to read the source every time to figure out what's going on.)

@tcharding
Copy link
Member Author
tcharding commented Apr 26, 2023

I strongly disagree. Only rust-bitcoin knows what HRPs are bitcoin-specific,

Interesting, we must be thinking of rust-bech32 differently. Perhaps if I describe how I'm thinking of it you can correct me with your view. I thought rust-bech32 is solely an implementation of BIP-173 and BIP-350. Because of how those bips are worded it allows for usage on alternate networks but those bips specify the current Bitcoin networks in use and the HRP for each network.

and if this library is supposed to maintain a list of all the HRPs in use everywhere that's a pretty big scope increase. (And if it's supposed to maintain only the three in BIP-173 that have named networks associated with them, excluding the others used in unit tests and the large set tof HRPs which it describes as "valid", that would be confusing to users and would require them to read the source every time to figure out what's going on.)

If I am not mistaken in the view I posted above I had hoped that clear rustdocs in the main module would explain the view and then anyone wanting more from the lib would go straight to primitives to use it.

FTR the benefit of using Network instead of an HRP string is that it makes encoding infallible (if we have WitnessVersion also, ie., the type system enforces correctness instead of checking strings and ints).

An alternate idea would be to make the lib general purpose and make a segwit module so one could do bech32::segwit::encode(Network::Bitcoin, WitnessVersion::0, data).

@apoelstra
Copy link
Member

Interesting, we must be thinking of rust-bech32 differently. Perhaps if I describe how I'm thinking of it you can correct me with your view. I thought rust-bech32 is solely an implementation of BIP-173 and BIP-350.

The word "network" is not used in either BIP. The only text supporting the Network type is a line in the "segwit addresses" section that says that mainnet addresses will use bc and testnet tb. There is no suggestion that this list would be exhaustive, and I don't see how we can make any sense of it without importing an understanding of networks into rust-bech32, where it definitely doesn't belong. Merely having a partial copy of the rust-bitcoin Network enum is not sufficient to achieve this.

FTR the benefit of using Network instead of an HRP string is that it makes encoding infallible (if we have WitnessVersion also, ie., the type system enforces correctness instead of checking strings and ints).

We can have an Hrp type that enforces the rules for HRPs. But there is nowhere in either BIP that suggests that "must be one of bc or tb is a rule", and in fact such a thing is contradicted both by the test vectors and by current usage in BOLT11.

An alternate idea would be to make the lib general purpose and make a segwit module so one could do bech32::segwit::encode(Network::Bitcoin, WitnessVersion::0, data).

This would be reasonable, though I think we should have a better name than Network. I could get behind segwit::KnownHrp or some such thing.

From BIP 141:

> A scriptPubKey (or redeemScript as defined in BIP16/P2SH) that
> consists of a 1-byte push opcode (for 0 to 16) followed by a data push
> between 2 and 40 bytes gets a new special meaning. The value of the
> first push is called the "version byte". The following byte vector
> pushed is called the "witness program".

`WitnessVersion` and `WitnessProgram` are scriptPubkey concerns and
scriptPubkey is basically synonymous with address so in one way it makes
sense that these types are in `address` however we are in the process of
overhauling the `Address` (and `AddressInner`) types so lets move the
witness stuff to `script` and put it in individual sub-modules.

This move helps simplify the address error type also.

Note please, there are a bunch of formatting changes in here in the
error type that I cannot explain and could not remove.
The version byte appears _before_ the witness program in a segwit
scriptPubkey however our `WitnessProgram` struct includes it. Either the
struct is incorrectly named of the version byte should not be part of
it. Observe also that we typically use the version byte at the same time
as the witness program and also that it is required to do the program
length checks.

Pull the version byte out of `WitnessProgram` and inline it into the
`Payload` segwit variant after first renaming the variant from
`Payload::WitnessProgram` to `Payload::Segwit`.

This change makes the code more semantically correct. Note that it does
introduce a non-explicit assumption that the same version be passed to
`WitnessProgram::new` as is used in the `Payload` - is this acceptable
engineering?
@tcharding
Copy link
Member Author

Superseded by #1951.

@tcharding tcharding closed this Aug 21, 2023
@tcharding tcharding deleted the WIP-use-new-bech32-api branch May 22, 2024 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0