Cpp-Taskflow
2.3.1
|
Modern scientific computing typically leverages GPU-powered parallel processing cores to speed up large-scale applications. This chapters discusses how to implement heterogeneous decomposition algorithms using CPU-GPU collaborative tasking.
Cpp-Taskflow enables concurrent CPU-GPU tasking by leveraging Nvidia CUDA Graph. The tasking interface is referred to as cudaFlow. A tf::cudaFlow is a graph object created at runtime similar to dynamic tasking. It manages a task node in a taskflow and associates it with a CUDA Graph. To create a cudaFlow, emplace a callable with an argument of type tf::cudaFlow. The following example implements the canonical saxpy (A·X Plus Y) task graph.
Debrief:
hx
and hy
dx
and dy
dx
and dy
on device, each of N*sizeof
(float) bytes Cpp-Taskflow does not expend unnecessary efforts on kernel programming but focus on tasking CUDA operations with CPU work. We give users full privileges to craft a CUDA kernel that is commensurate with their domain knowledge. Users focus on developing high-performance kernels using a native CUDA toolkit, while leaving difficult task parallelism to Cpp-Taskflow.