-
Notifications
You must be signed in to change notification settings - Fork 831
Introduce the small-hash
feature for bitcoin_hashes
#1990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c2698b8
to
e46a718
Compare
Can you edit |
hashes/src/sha256.rs
Outdated
#[allow(non_snake_case)] | ||
const fn Sigma1(x: u32) -> u32 { x.rotate_left(26) ^ x.rotate_left(21) ^ x.rotate_left(7) } | ||
const fn sigma0(x: u32) -> u32 { x.rotate_left(25) ^ x.rotate_left(14) ^ (x >> 3) } | ||
const fn sigma1(x: u32) -> u32 { x.rotate_left(15) ^ x.rotate_left(13) ^ (x >> 10) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In e46a718633ee897ff00fd8fbcde81a06ced3b702:
I think you could promote these to the main module. I'm doubtful that converting macros to const fn
s would have any effect at all on performance. Plausibly it could increase performance if it made the compiler more aware when we're repeating the same operations and could vectorize them. Though we should measure it. But I think it should just give the compiler more freedom, to make them functions when it's optimizing for size and to inline them otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess the compiler would probably normally inline them, unless you are optimizing for size.
If it ends up affecting performance we could also tell the compiler to inline them explicitly when small-hash
is not enabled with #[cfg-attr(...)]
Overall concept ACK. I'm super impressed at how small this diff is. I was a little apprehensive about reviewing this PR but it's much smaller than I expected. |
Just thinking aloud. I wonder if this should be a Cargo feature. Feels kind of weird ... as this doesn't really enable any "feature", and is more like a compilation level thing. I wonder if there are any idiomatic alternatives. Feel to me like this would be best controlled by env var during compilation. Or maybe I'm just overcomplicating. |
Yeah one of my main goals was to keep this very easy to review (and also for me to write, since I don't have much experience writing and optimizing code like this). I'm sure we could squeeze out a lot more space/performance with more effort, but this is probably a good trade-off between code readability and space/performance imho. @dpc regarding the cargo feature thing: sometimes features are used in this way, for example secp has a |
I agree that it's a little weird to use a cargo feature, but also think it's the right choice, for the same reasons as the rust-secp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the trick @afilini, why is one fast and one small, const functions vs macros, I don't see it right now. Thanks
hashes/src/sha256.rs
Outdated
pub(super) const fn round( | ||
a: u32, | ||
b: u32, | ||
c: u32, | ||
d: u32, | ||
e: u32, | ||
f: u32, | ||
g: u32, | ||
h: u32, | ||
k: u32, | ||
w: u32, | ||
) -> (u32, u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps custom formating would be better here, something like:
#[rustfmt::skip]
pub(super) const fn round(a: u32, b: u32, c: u32, d: u32, e: u32,
f: u32, g: u32, h: u32, k: u32, w: u32) -> (u32, u32) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah hey all we have to do to have sane formatting is polish our code with some pretty annotations!
@tcharding I suspect that constfns are just as fast, but the reason that they (can be) smaller is that macros are always inlined whereas constfns might not be. Inlining is faster because it avoids function call overhead. Also +1 to custom formatting. |
So I tried to read the dragon book a couple of times before and never made it past the first chapters. Anyone got any suggestions on books/sites to learn about compiler optimisations without making my eyes bleed? |
With green CI I'm happy to ack this. Its basically just swapping macros for const functions, it did take me a while to work that out though :) |
Do we want this in the upcoming |
e46a718
to
0fcb907
Compare
When enabled this feature swaps the hash implementation of sha512, sha256 and ripemd160 for a smaller (but also slower) one. On embedded processors (Cortex-M4) it can lead to up to a 52% size reduction, from around 37KiB for just the `process_block` methods of the three hash functions to 17.8KiB.
0fcb907
to
f2c5f19
Compare
I pushed a new version with all the feedback received so far: I didn't notice any performance regression by switching to const fns in the "fast hash" implementation. Even when compiling with I also manually formatted the functions with many arguments, I tried to break them up into a few lines and I think they look pretty good now. @tcharding just a sidenote: I don't know much about compilers and optimizations either, the idea behind this PR was just to reuse common code as much as possible rather than expanding the full hash function, which ends up being huge. I think the implementation you have here is a pretty good tradeoff between speed and size: it could be made much faster by writing critical sections in assembly manually, or even using SIMD instructions on supported platforms. It could also be made much smaller by once again playing with assembly manually, but that's pretty complicated and this library isn't trying to be either a fast or super small hash functions library, so I think this is more than enough. This is all to say that somebody who really knows these stuff could definitely make big improvements, but they would all have a cost in terms of code readability/simplicity, so I'm not even sure it's worth for this lib. |
Yeah, let's do it. |
Nice! I suspect But in the interest of not making you do a ton of iterations on this, I'm happy to ACK this as is. But if you want to try this, that'd be great :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f2c5f19
Thanks for the explanation @afilini! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f2c5f19
When enabled this feature swaps the hash implementation of sha512, sha256 and ripemd160 for a smaller (but also slower) one.
On embedded processors (Cortex-M4) it can lead to up to a 52% size reduction, from around 37KiB for just the
process_block
methods of the three hash functions to 17.8KiB.The following numbers were collected on
aarch64-unknown-linux-gnu
withcargo 1.72.0-nightly
.Original
Total size: 32380 bytes
With
small-hash
enabledTotal size: 22552 bytes
Conclusion
On
aarch64
there's overall a ~30% improvement in size, although ripemd160 doesn't really shrink that much (and its performance also aren't impacted much with only a 6% slowdown). sha512 and sha256 instead are almost 40% slower withsmall-hash
enabled.I don't have performance numbers for other architectures, but in terms of size there was an even larger improvements on
thumbv7em-none-eabihf
, with a 52% size reduction overall:vs
I'm assuming this is because on more limited architectures the compiler needs to use more instructions to move data in and out of registers (especially for sha512 which ideally would benefit from 64-bit registers), so reusing the code by moving it into functions saves a lot of those instructions.
Also note that the
const_hash
method onsha256
causes the compiler to emit two independent implementations. I haven't looked into the code yet, maybe there's a way to merge them so that the non-constprocess_block
calls into the const fn.Note: commits are unverified right now because I don't have the keys available, I will sign them after addressing the review comments.