Cpp-Taskflow  2.1.0
C6: Manage Threads and Executor

We discuss in this chapter the thread management and task execution schemes in Cpp-Taskflow. We will go through the concept of thread, ownership, and executor in Cpp-Taskflow.

Master, Workers, and Executor

Cpp-Taskflow defines a strict relationship between the master and workers. Master is the thread that creates the taskflow object and workers are threads that invoke the callable target of a task. Each taskflow object owns an executor instance that implements the execution of a task, for example, by a thread in a shared pool. By default, Cpp-Taskflow uses std::thread::hardware_concurrency to decide the number of worker threads and std::thread::get_id to identify the ownership between the master and workers.

std::cout << std::thread::get_id() << std::endl; // master thread id
tf::Taskflow tf1; // create a taskflow object with the default number of workers
tf::Taskflow tf2{4}; // create a taskflow object with four workers

In the above example, the master thread owns both taskflow objects. The first taskflow object tf1 creates eight (default by std::thread::hardware_concurrency) worker threads and the second taskflow object tf2 creates four worker threads. Including the master thread, there will be a total of 1 + 8 + 4 = 13 threads running in this program. If you create a taskflow with zero workers, the master will carry out all the tasks by itself. That is, using one worker and zero worker are conceptually equivalent to each other since they both end up using one thread to run all tasks (see the snippet below).

tf::Taskflow tf1(0); // one master, zero worker (master to run tasks)
tf::Taskflow tf2(1); // one master, one worker (one thread to run tasks)

In general, the master thread is exposed to users at programming time (main thread), while the worker threads are transparently maintained by the taskflow object. Each taskflow object owns an executor managed by std::shared_ptr. The default executor implements a work stealing scheduler to efficiently carry out tasks. The Taskflow class defines a member type tf::Taskflow::Executor as an alias of the associated executor type. Users can acquire an ownership of the executor from a taskflow object through the method tf::Taskflow::share_executor.

tf::Taskflow taskflow;
std::shared_ptr<tf::Taskflow::Executor> ptr = taskflow.share_executor(); // share the executor
assert(ptr.use_count() == 2); // the executor is owned by ptr and tf

The shared property allows users to create their own resource manager and construct a taskflow object on top. The executor has only one constructor that takes an unsigned integer indicating the number of worker threads to spawn.

auto ptr = std::make_shared<tf::Taskflow::Executor>(4); // create an executor of 4 workers
tf::Taskflow taskflow(ptr); // create a taskflow object on top of the executor
assert(ptr.use_count() == 2); // the executor is owned by ptr and tf

Share an Executor among Taskflow Objects

It is sometime useful to share one executor among multiple taskflow objects in order to avoid the thread over-subscription problem. In the case of over-subscription, the number of threads running in a program exceeds the number of available logical cores, resulting in additional and unnecessary context switches. Context switch has nonzero cost and is especially costly when it crosses cores. The following example mimics the over-subscription problem through a creation of 100 taskflow objects each with its own executor of four threads, assuming only four logical cores present in the machine.

// create 100 taskflow objects on top of the same executor
auto executor = std::make_shared<tf::Taskflow::Executor>(4);
for(size_t i=0; i<100; ++i) {
assert(executor.use_count() == i + 1); // by the executor and each taskflow
tfs.emplace_back(executor); // create a taskflow object from the executor
}
// a total of 1 + 4 = 5 threads running in this program

Customize Your Executor Interface

Cpp-Taskflow permits users to define their own executor interface and integrate it into the taskflow object being built. In most cases, the executor is implemented as a thread pool to run given tasks. Your executor class must obey the following concepts in order to work with Cpp-Taskflow:

template <typename C>
class MyExecutor { // closure type C, callable on operator ()
public:
MyExecutor(unsigned); // constructor on a number of worker threads (might be zero)
size_t num_workers() const; // return the number of worker threads (might be zero)
template <typename... ArgsT>
void emplace(ArgsT&&...); // arguments to construct the closure C
template <typename C>
void batch(std::vector<C>&&); // a vector of closures for batch insertions
};
using MyTaskflow = tf::BasicTaskflow<MyExecutor>;

The executor class template with one parameter on the task type. The task type can be a generic polymorphic function wrapper, for instance, std::function<void()>, or a callable class with fixed memory layout. It is completely up to users to define how to invoke the task. Your executor class must meet the following concepts:

  • a constructor on a given number of worker threads
  • a constant method num_workers to return the number of worker threads
  • a method emplace to dispatch a task to thread with arguments to forward to the constructor of the task
  • a method batch to dispatch multiple tasks in a vector to thread

Cpp-Taskflow requires little requirement on the executor class. Each taskflow object has its own internal data structure to keep track of the lifetime and execution status of a task. The executor only needs to guarantee a thread to run the task given by the methods emplace and batch. We recommend users to read our built-in executor implementation tf::WorkStealingThreadpool for more details.

Thread Safety

The Taskflow object is NOT thread-safe. Touching a taskflow object from multiple threads can result in undefined behavior. Thread safety has nothing to do with the master nor the workers. It is completely safe to access the taskflow object as long as only one thread presents at a time. However, we strongly recommend users to acknowledge the definition of the master and the workers, and separate the program control flow accordingly. Having a clear thread ownership can greatly reduce the chance of buggy implementations and undefined behaviors.

Example 1: Impact of Over-subscription

The example below demonstrates the impact of thread over-subscription. The workload is a task dependency graph of four tasks doing compute-intensive matrix multiplication. We benchmarked the performance between the two implementations with and without sharing an executor.

1: void create_task_dependency_graph(tf::Taskflow& tf) {
2: for(size_t i=0; i<MAX_COUNT; ++i) {
3: auto [A, B, C, D] = tf.emplace(
4: [&] () { matrix_multiplication(); },
5: [&] () { matrix_multiplication(); },
6: [&] () { matrix_multiplication(); },
7: [&] () { matrix_multiplication(); }
8: );
9: A.precede(B);
10: A.precede(C);
11: C.precede(D);
12: B.precede(D);
13: }
14: }
15:
16: // create multiple executors each with its own executor
17: auto unique_executor() {
18:
20:
22:
23: for(size_t i=0; i<MAX_TASKFLOW; ++i) {
24: tf::Taskflow& tf = tfs.emplace_back(MAX_THREAD);
25: create_task_dependency_graph(tf);
26: assert(tf.share_executor().use_count() == 2);
27: }
28:
30: for(tf::Taskflow& tf : tfs) {
31: futures.emplace_back(tf.dispatch());
32: }
33:
34: for(auto& fu : futures) {
35: fu.get();
36: }
37:
40: }
41:
42: // create multiple executors on top of the same executor
43: auto shared_executor() {
44:
46:
48:
49: auto executor = std::make_shared<tf::Taskflow::Executor>(MAX_THREAD);
50:
51: for(size_t i=0; i<MAX_TASKFLOW; ++i) {
52: assert(executor.use_count() == i + 1);
53: tf::Taskflow& tf = tfs.emplace_back(executor);
54: create_task_dependency_graph(tf);
55: }
56:
58: for(tf::Taskflow& tf : tfs) {
59: futures.emplace_back(tf.dispatch());
60: }
61:
62: for(auto& fu : futures) {
63: fu.get();
64: }
65:
68: }

Debrief:

  • Line 1-14 creates a task dependency graph composed of compute-intensive matrix operations
  • Line 16-40 creates multiple independent taskflow objects without sharing the running threads
  • Line 42-68 creates multiple taskflow objects from the same executor to run on the same set of threads

Running the program on different number of taskflow objects gives the following runtime values:

# taskflows shared (ms) unique (ms)
1 120 114
2 225 229
4 451 452
8 908 904
16 1791 1837
32 3581 3782
64 7183 7636
128 14341 15482

As we increase the number of taskflow objects, the implementation without sharing the executor encounters more context switches among threads. This overhead reflected on the slower runtime (15482 vs 14341 on 128 taskflow objects).