Simple C++ thread pool class with no external dependencies. This class can be built with any C++ version >= C++11. This thread pool is implemented using a single work queue, and a fixed size pool of worker threads. Work items(functions) are processed in a FIFO order.
Why is this useful? Why not use std::async
?
Before writing this code, I tried using std::async
. However, I ran into the
following problems:
std::async(std::launch::async, ...)
launches a new thread for every invocation. On Mac and Linux, no thread pool is used so you have to pay the price of thread creation (about 0.5ms on my laptop) for each call.- If you use
std::async
you must have to carefully manage the number of in-flight threads to achieve peak performance.
This project is licensed under the terms of the MIT license.
This simple thread pool design works well for embarrassingly parallel workloads
that don't block for long periods of time. Many graphics, image processing, and
computer vision applications fit this criteria. In this case, you want to set
the thread pool size to ThreadPool::GetDefaultThreadPoolSize()
which returns
the number of logical cores your machine has.
Workloads that block for long durations(disk IO, holding locks for a long time, etc.) won't perform well with this thread pool design, especially if you set your thread pool size to the number of logical cores. As a workaround, you can set the number of threads to a very large number (N * number of logical cores). However, this solution is suboptimal. A better approach is to use a thread pool that implements work stealing. This is because the blocked function will occupy one of the threads in the pool, even while it isn't doing useful work.
Very short tasks (a few ms) also aren't a good fit for this thread pool implementation. This is because we incur some overhead synchronizing access to the single work queue. I recommend batching the work until each task takes at least a few tens of milliseconds,
- Install bazel
- Run unit tests and/or benchmarks if you desire:
./scripts/build_and_run_unit_tests.sh
./scripts/build_and_run_unit_benchmarks.sh
- Write your own code that uses
src/thread_pool.h
. If you are using bazel, this is as easy as depending onsrc:thread_pool
.
-
Writeup benchmark info.
-
Include thread sanitizer tests.
-
Ideally, the
ScheduleAndGetFuture
function would be able to be called with the exact same arguments you would pass tostd::async
. This is true if you are compiling with C++17, but with earlier C++ versions there is a slight limitation. Without C++17, you can't "directly" invoke member functions as you would withstd::async
. For example, the following code is valid C++11:
class MyClass {
public:
explicit MyClass(int value) : value_(value) {}
int ComputeSum(int a) const {
return a + value_;
}
private:
int value_;
};
int main() {
MyClass object(12);
std::future<int> the_sum = std::async(std:launch::async,
&MyClass::ComputeSum, 8);
std::cout << the_sum.get() << std::endl; // Prints 20.
}
However, a similar piece of code using thread_pool.h
won't compile with
C++11...but it does work with C++17.
int main() {
MyClass object(12);
ThreadPool pool(4);
// The line below will compile with C++17, but will fail to compile on older
// C++ versions.
std::future<int> the_sum = pool.ScheduleAndGetFuture(&MyClass::ComputeSum, 8);
std::cout << the_sum.get() << std::endl; // Prints 20.
}
You can use a simple lambda to workaround this issue if you can't use c++17:
int main() {
MyClass object(12);
ThreadPool pool(4);
std::future<int> the_sum = pool.ScheduleAndGetFuture([&object](int a) {
return object.ComputeSum(a);
}, 8);
std::cout << the_sum.get() << std::endl; // Prints 20.
}
The root of this issue is that
std::invoke
isn't available before C++17. Without std::invoke
calling a member function
in a generic way requires surprisingly tricky template metaprogramming.
-
Include more example code.
-
Remove
ThreadPool::Schedule
and useThreadPool::ScheduleAndGetFuture
everywhere. Users can just ignore the returnedstd::future
if needed. -
If we ever want to handle tasks that block for long periods, we should investigate work stealing and using 1 queue per thread. The latter change is invasive, since we need to know when a thread is blocked for a "long enough" time.
-
Handle exceptions properly.