Cpp-Taskflow
2.1.0
|
We discuss in this chapter the thread management and task execution schemes in Cpp-Taskflow. We will go through the concept of thread, ownership, and executor in Cpp-Taskflow.
Cpp-Taskflow defines a strict relationship between the master and workers. Master is the thread that creates the taskflow object and workers are threads that invoke the callable target of a task. Each taskflow object owns an executor instance that implements the execution of a task, for example, by a thread in a shared pool. By default, Cpp-Taskflow uses std::thread::hardware_concurrency to decide the number of worker threads and std::thread::get_id to identify the ownership between the master and workers.
In the above example, the master thread owns both taskflow objects. The first taskflow object tf1
creates eight (default by std::thread::hardware_concurrency) worker threads and the second taskflow object tf2
creates four worker threads. Including the master thread, there will be a total of 1 + 8 + 4 = 13 threads running in this program. If you create a taskflow with zero workers, the master will carry out all the tasks by itself. That is, using one worker and zero worker are conceptually equivalent to each other since they both end up using one thread to run all tasks (see the snippet below).
In general, the master thread is exposed to users at programming time (main thread), while the worker threads are transparently maintained by the taskflow object. Each taskflow object owns an executor managed by std::shared_ptr. The default executor implements a work stealing scheduler to efficiently carry out tasks. The Taskflow class defines a member type tf::Taskflow::Executor as an alias of the associated executor type. Users can acquire an ownership of the executor from a taskflow object through the method tf::Taskflow::share_executor.
The shared property allows users to create their own resource manager and construct a taskflow object on top. The executor has only one constructor that takes an unsigned integer indicating the number of worker threads to spawn.
It is sometime useful to share one executor among multiple taskflow objects in order to avoid the thread over-subscription problem. In the case of over-subscription, the number of threads running in a program exceeds the number of available logical cores, resulting in additional and unnecessary context switches. Context switch has nonzero cost and is especially costly when it crosses cores. The following example mimics the over-subscription problem through a creation of 100 taskflow objects each with its own executor of four threads, assuming only four logical cores present in the machine.
Cpp-Taskflow permits users to define their own executor interface and integrate it into the taskflow object being built. In most cases, the executor is implemented as a thread pool to run given tasks. Your executor class must obey the following concepts in order to work with Cpp-Taskflow:
The executor class template with one parameter on the task type. The task type can be a generic polymorphic function wrapper, for instance, std::function<void()>
, or a callable class with fixed memory layout. It is completely up to users to define how to invoke the task. Your executor class must meet the following concepts:
Cpp-Taskflow requires little requirement on the executor class. Each taskflow object has its own internal data structure to keep track of the lifetime and execution status of a task. The executor only needs to guarantee a thread to run the task given by the methods emplace and batch. We recommend users to read our built-in executor implementation tf::WorkStealingThreadpool for more details.
The Taskflow object is NOT thread-safe. Touching a taskflow object from multiple threads can result in undefined behavior. Thread safety has nothing to do with the master nor the workers. It is completely safe to access the taskflow object as long as only one thread presents at a time. However, we strongly recommend users to acknowledge the definition of the master and the workers, and separate the program control flow accordingly. Having a clear thread ownership can greatly reduce the chance of buggy implementations and undefined behaviors.
The example below demonstrates the impact of thread over-subscription. The workload is a task dependency graph of four tasks doing compute-intensive matrix multiplication. We benchmarked the performance between the two implementations with and without sharing an executor.
Debrief:
Running the program on different number of taskflow objects gives the following runtime values:
As we increase the number of taskflow objects, the implementation without sharing the executor encounters more context switches among threads. This overhead reflected on the slower runtime (15482 vs 14341 on 128 taskflow objects).