Cpp-Taskflow
2.2.0
|
Running a for-loop in parallel is the most fundamental building block in parallel programming. In this chapter, we are going to demonstrate how to use Cpp-Taskflow to create a task dependency graph of parallel for-loop.
Cpp-Taskflow has a STL-style method tf::Taskflow::parallel_for(I beg, I end, C&& callable, size_t chunk) that takes a range of items and applies a callable to each of the item in parallel. The method constructs a task dependency graph representing this workload and returns a task pair as two synchronization points to this task graph.
The above code generates the following task dependency graph. The label 0x56* represents an internal task node to execute the callable object. By default (chunk=0
), Cpp-Taskflow evenly partitions and distributes the workload to all threads. In our example of eight tasks and four workers, each internal node is responsible for two items.
Debrief:
parallel_for
S
and T
Here is one possible output of this program:
By default, Cpp-Taskflow partitions the workload evenly across the workers. In some cases, it is useful to disable this feature and apply user-specified partition. The method parallel_for
tasks an unsigned integer chunk
as the number of items in each partition.
The above example will force each partition to run exactly one item. This can be useful when you have unbalanced workload and would like to enable more efficient parallelization.
You can leave the chunk size to 0 to use our default partition strategy.
You can explicitly construct a dependency graph that represents a parallel execution of a for-loop using only the basic methods tf::Taskflow::emplace and tf::Task::precede.
Cpp-Taskflow provides an overload tf::Taskflow::parallel_for(I beg, I end, I step, C&& callable, size_t chunk) to parallelize an index-based for-loop. It takes three numbers beg
, end
, and step
of the same type I
and applies callable
to each index in the range [beg, end)
with the step size step
.
It also works on the opposite order with negative step size.
Similarly, you can specify a chunk size to group the works per thread.
By default, Cpp-Taskflow performs even partition across the number of available threads if no group size is given.
This example demonstrates how to use tf::Taskflow::parallel_for to create a parallel map pattern. The map operator modifies each item in the container to one if it is an odd number, or zero if it is an even number.
The program outputs the following:
This example demonstrates how to pipeline a parallel-for workload with other tasks.
The output of this programs is:
Debrief: