-
-
Notifications
You must be s 8000 igned in to change notification settings - Fork 1
Other fast(er) hash algorithms #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a fair point, might try one of these to see if the performance improves. In this specific use-case we are simply hashing 4 byte integer, hence the input is constant and we only need the mixer part. So we're technically not using the full xxh3 and this code might be simpler indeed. In fact, there's no loops to unroll in the first place. Here's what the library is using now: x := v ^ (0x1cad21f72c81017c ^ 0xdb979083e96dd4de) + seed
x ^= bits.RotateLeft64(x, 49) ^ bits.RotateLeft64(x, 24)
x *= 0x9fb21c651e98df25
x ^= (x >> 35) + 4
x *= 0x9fb21c651e98df25
x ^= (x >> 28) Most of the hashes are pretty much the same, I just looked at x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29;
x *= 0xbea225f9eb34556d;
x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29; My biggest problem is the atomic counter actually. This instruction doesn't scale with manycore systems so I used |
In fact, I just tried by swapping it with Here's results with
And these are with
|
Yep, the speed you measured is something I more or less expected (though xxh3 seems just partially implemented which is kind of an "unfair" advantage speed-wise 😉 - but that's what you said, so nothing bad about it). But I wonder why do you consider an atomic counter non-scalable? From my humble understanding it should be enough te have just a "pre-seed" as an atomic counter (to have different random values in each thread) but this shouldn't affect the PRNG at all because you can read the process-global pre-seed into thread-local storage into a "true seed" for the PRNG and since then everything works truly in parallel. But I probably misunderstood what you mean. In which case there are splittable PRNGs such as JAX. |
Hey, I'll have a look into JAX, never heard about it. For the atomic increment is still a lock, so it would not scale beyond a certain number of cores. Most of the time, you don't care as it's not called that often, but when I was working on my For the thread-local storage, this was actually my initial approach this morning. The issue is that there's no native way to use it in Go, which is a bit annoying. So I started by having an array of counters where To go around that, I decided to try out |
you can try wyrand PRNG in wyhash package |
Sure, anything "atomic" inside of a non-splittable PRNG will cause serious degradation. I'd try really hard to avoid anything shared.
This is really bad - I didn't look at Go's support for thread local storage, but if it's not there I'd definitely invest energy (even if it's a lot) into implementing my own tiny constrained support for that (at least for Go's default compiler - not for cgo nor any other for the beginning) using |
Just out of curiosity, let me mention there are faster alternatives to xxh3 or algos with about the same speed but jaw-droppingly simpler. See https://github.com/rurban/smhasher/ .
Namely fastest (for both bulk & short inputs) seems to be
wyhash
and simpliest from the fastest onesmx3
(btw. I feelmx3
could still have some space for speed optimization as I suspect some false dependencies between instructions in the pipeline... - maybe even good old manual loop unrolling could make a difference, IDK).But maybe xxh3 is better in some ways which I don't see but would like to. I can imagine something related to DBs but I don't know what exactly 😉.
The text was updated successfully, but these errors were encountered: