-
Notifications
You must be signed in to change notification settings - Fork 37.4k
checkqueue: implement a new scriptcheck worker pool with atomic variables #32791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32791. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
98b1db9283b5c04775309fa5dc2161fc597b355a
ac947ff
to
7d2b002
Compare
hmmm...Ok, I realized that this may only be suitable for when all the preout Coin are in CCoinsViewCache, otherwise the |
7d2b002
to
15ca020
Compare
50a5c2c
to
299d5b5
Compare
…bles TL;DR This patchset proposes a new design of scriptcheck worker pool with atomic variables to leverage more cpu resource on a machine which the current mutex version cannot. Achieve about 45% boost in performance. Main Idea: The current version of worker pool use mutex to do the synchronization between worker threads, which makes the performance downgrade after worker_threads_num >= 14. One root cause is the high contention on the queue's mutex among worker threads. Workers repeatedly sleep and be woken up in short time, leading to heavy context switching. The idea is to use atomic variables to protect the queue, workers acquire it and increase it with a batch_size(a small number). This significantly reduces context switching. Implementation To implement it, below changes are made:(enabled in atomic mode) - atomic<int> nNext is used to indicate the next available indice in the queue - atomic<int> nTodo2 is used to indicate the number of tasks left - nBatchSize is the batch size of tasks everytime a worker consumes it should be small to keep workload balance among workers - m_mutex is still used, to protect m_result and conditional vars - fAtomic to indicate which mode is enabled - about the queue: - the queue is used as queue, not stacks any more - the queue is 'add only', which means only adding tasks to it, don't remove used tasks. the above two are a must to make this idea achievable, and the second sacrifices space - start the workers only after all the tasks have been added to the queue this is a trade-off to make the implementaion simple, a worker can compare nNext + nBatchSize to queue.size() to find if there are tasks undone in the queue, otherwise another atomic<int> tail is needed, and race condition may exist in this way (need more investigation). From my test, it's not 100% to say this is a 'trade-off' since it also brings something good, pros && cons: - pros: 1. no contention among producer and consumers when adding tasks to the queue 2. the adding period is very short compared to the verification period - cons: 1. the workers start later For details, please refer to the worker handler code Tests: - Numbers - For blocks/s numbers `./build/bin/bench_bitcoin --filter=ConnectBlockMixedEcdsaSchnor` - For cpu usage numbers `NANOBENCH_ENDLESS=ConnectBlockAllEcdsa ./build/bin/bench_bitcoin --filter=ConnectBlockAllEcdsa` - Some data is missing because the machine was billed by time, and there wasn't enough time to complete all measurements. - Some cells contain a range instead of a single value due to high fluctuations - The numbers represent the average of multiple test runs. | mutex verion | atomic version nr_workers | blocks/s | cpu usage | blocks/s | cpu usage2 -----------|-----------|------------|-----------|------------- 0 | 4.22 | 100% | 4.2 | 100% 1 | 8.23 | 198% | 8 | 191.3% 2 | 12.08 | 293.7% | 11.6 | 276% 3 | 16 | 386.7% | 14.9 | 354.5% 4 | 19.8 | 476.3% | | 5 | 23.12 | 565.4% | | 6 | 27.3 | 654% | 23 | 560% 7 | 28.5 | 738% | | 8 | 33.8 | 809% | | 9 | 33.5 | 868% | 28.5 | 730% 10 | 37.4 | 910% | 30.5 | 777% 11 | 33~40 | 880%~928% | 32 | 825% 12 | 22~39 | 860%~925% | 32.5 | 830% 13 | 27.8~37 | 700%~913% | 34.8 | 928% 15 | 24.8~38.9 | 640%~719% | 38 | 1000% 19 | 18~23 | 370%~510% | 41 | 1161% 22 | 7~18 | 270%~400% | | 25 | 6.4 | 260% | 46 | 1400% 30 | 5 | 270% | 49 | 1560% 60 | 5 | 280% | 50~58 | 2100% - Env: A virtual machine on cloud Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (Cascadelake) CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 Stepping: 6 BogoMIPS: 5985.93 Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 2 MiB (64 instances) L1i: 2 MiB (64 instances) L2: 128 MiB (32 instances) L3: 32 MiB (2 instances) NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Signed-off-by: Hao Xu <hao.xu@linux.dev>
The current max number of scriptcheck threads is MAX_SCRIPTCHECK_THREADS = 14 due to the limitation of the current mechanism, release it to GetNumCores() as we have atomic mode of scriptcheck worker pool. Signed-off-by: Hao Xu <hao.xu@linux.dev>
When verification of a transaction get failed, make workers quit the work doing path and sleep immediately. Signed-off-by: Hao Xu <hao.xu@linux.dev>
GetNumCores() is now used in checkqueue related code, thus it has to be added to libbitcoinkernel to keep everything ok Signed-off-by: Hao Xu <hao.xu@linux.dev>
299d5b5
to
a222779
Compare