A General-purpose Task-parallel Programming System using Modern C++; Tsung-Wei Huang (2018).
npm install taskflow.cxx
bash
$ npm i taskflow.cxx
`
And then include taskflow.hpp as follows:
`cxx
// main.cxx
#include
int main() { / ... / }
`
Finally, compile while adding the path node_modules/taskflow.cxx to your compiler's include paths.
`bash
$ clang++ -I./node_modules/taskflow.cxx main.cxx # or, use g++
$ g++ -I./node_modules/taskflow.cxx main.cxx
`
You may also use a simpler approach with the cpoach tool, which automatically adds the necessary include paths of all the installed dependencies for your project.
`bash
$ cpoach clang++ main.cxx # or, use g++
$ cpoach g++ main.cxx
`
Start Your First Taskflow Program
The following program (simple.cpp) creates a taskflow of four tasks
A, B, C, and D, where A runs before B and C, and D
runs after B and C.
When A finishes, B and C can run in parallel.
Try it live on Compiler Explorer (godbolt)!
`cpp
#include // Taskflow is header-only
int main(){
tf::Executor executor;
tf::Taskflow taskflow;
auto [A, B, C, D] = taskflow.emplace( // create four tasks
[] () { std::cout << "TaskA\n"; },
[] () { std::cout << "TaskB\n"; },
[] () { std::cout << "TaskC\n"; },
[] () { std::cout << "TaskD\n"; }
);
A.precede(B, C); // A runs before B and C
D.succeed(B, C); // D runs after B and C
executor.run(taskflow).wait();
return 0;
}
`
Taskflow is header-only and there is no wrangle with installation.
To compile the program, clone the Taskflow project and
tell the compiler to include the headers.
`bash
~$ git clone https://github.com/taskflow/taskflow.git # clone it only once
~$ g++ -std=c++20 examples/simple.cpp -I. -O2 -pthread -o simple
~$ ./simple
TaskA
TaskC
TaskB
TaskD
`
Visualize Your First Taskflow Program
Taskflow comes with a built-in profiler,
TFProf,
for you to profile and visualize taskflow programs
in an easy-to-use web-based interface.

`bash
run the program with the environment variable TF_ENABLE_PROFILER enabled
~$ TF_ENABLE_PROFILER=simple.json ./simple
~$ cat simple.json
[
{"executor":"0","data":[{"worker":0,"level":0,"data":[{"span":[172,186],"name":"0_0","type":"static"},{"span":[187,189],"name":"0_1","type":"static"}]},{"worker":2,"level":0,"data":[{"span":[93,164],"name":"2_0","type":"static"},{"span":[170,179],"name":"2_1","type":"static"}]}]}
]
paste the profiling json data to https://taskflow.github.io/tfprof/
`
In addition to execution diagram, you can dump the graph to a DOT format
and visualize it using a number of free [GraphViz][GraphViz] tools.
`
// dump the taskflow graph to a DOT format through std::cout
taskflow.dump(std::cout);
`

Express Task Graph Parallelism
Taskflow empowers users with both static and dynamic task graph constructions
to express end-to-end parallelism in a task graph that
embeds in-graph control flow.
- Taskflow
- Why Taskflow?
- Installation
- Start Your First Taskflow Program
- Visualize Your First Taskflow Program
- Express Task Graph Parallelism
- Create a Subflow Graph
- Integrate Control Flow to a Task Graph
- Offload a Task to a GPU
- Compose Task Graphs
- Launch Asynchronous Tasks
- Execute a Taskflow
- Leverage Standard Parallel Algorithms
- Supported Compilers
- Learn More about Taskflow
- License
Create a Subflow Graph
Taskflow supports dynamic tasking for you to create a subflow
graph from the execution of a task to perform dynamic parallelism.
The following program spawns a task dependency graph parented at task B.
`cpp
tf::Task A = taskflow.emplace([](){}).name("A");
tf::Task C = taskflow.emplace([](){}).name("C");
tf::Task D = taskflow.emplace([](){}).name("D");
tf::Task B = taskflow.emplace([] (tf::Subflow& subflow) {
tf::Task B1 = subflow.emplace([](){}).name("B1");
tf::Task B2 = subflow.emplace([](){}).name("B2");
tf::Task B3 = subflow.emplace([](){}).name("B3");
B3.succeed(B1, B2); // B3 runs after B1 and B2
}).name("B");
A.precede(B, C); // A runs before B and C
D.succeed(B, C); // D runs after B and C
`

Integrate Control Flow to a Task Graph
Taskflow supports conditional tasking for you to make rapid
control-flow decisions across dependent tasks to implement cycles
and conditions in an end-to-end task graph.
`cpp
tf::Task init = taskflow.emplace([](){}).name("init");
tf::Task stop = taskflow.emplace([](){}).name("stop");
// creates a condition task that returns a random binary
tf::Task cond = taskflow.emplace(
[](){ return std::rand() % 2; }
).name("cond");
init.precede(cond);
// creates a feedback loop {0: cond, 1: stop}
cond.precede(cond, stop);
`

Offload a Task to a GPU
Taskflow supports GPU tasking for you to accelerate a wide range of scientific computing applications by harnessing the power of CPU-GPU collaborative computing using Nvidia CUDA Graph.
`cpp
__global__ void saxpy(size_t N, float alpha, float dx, float dy) {
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < N) {
y[i] = alpha*x[i] + y[i];
}
}
// create a CUDA Graph task
tf::Task cudaflow = taskflow.emplace([&]() {
tf::cudaGraph cg;
tf::cudaTask h2d_x = cg.copy(dx, hx.data(), N);
tf::cudaTask h2d_y = cg.copy(dy, hy.data(), N);
tf::cudaTask d2h_x = cg.copy(hx.data(), dx, N);
tf::cudaTask d2h_y = cg.copy(hy.data(), dy, N);
tf::cudaTask saxpy = cg.kernel((N+255)/256, 256, 0, saxpy, N, 2.0f, dx, dy);
saxpy.succeed(h2d_x, h2d_y)
.precede(d2h_x, d2h_y);
// instantiate an executable CUDA graph and run it through a stream
tf::cudaGraphExec exec(cg);
tf::cudaStream stream;
stream.run(exec).synchronize();
}).name("CUDA Graph Task");
`

Compose Task Graphs
Taskflow is composable.
You can create large parallel graphs through composition of modular
and reusable blocks that are easier to optimize at an individual scope.
`cpp
tf::Taskflow f1, f2;
// create taskflow f1 of two tasks
tf::Task f1A = f1.emplace([]() { std::cout << "Task f1A\n"; })
.name("f1A");
tf::Task f1B = f1.emplace([]() { std::cout << "Task f1B\n"; })
.name("f1B");
// create taskflow f2 with one module task composed of f1
tf::Task f2A = f2.emplace([]() { std::cout << "Task f2A\n"; })
.name("f2A");
tf::Task f2B = f2.emplace([]() { std::cout << "Task f2B\n"; })
.name("f2B");
tf::Task f2C = f2.emplace([]() { std::cout << "Task f2C\n"; })
.name("f2C");
tf::Task f1_module_task = f2.composed_of(f1)
.name("module");
f1_module_task.succeed(f2A, f2B)
.precede(f2C);
`

Launch Asynchronous Tasks
Taskflow supports asynchronous tasking.
You can launch tasks asynchronously to dynamically explore task graph parallelism.
`cpp
tf::Executor executor;
// create asynchronous tasks directly from an executor
std::future future = executor.async([](){
std::cout << "async task returns 1\n";
return 1;
});
executor.silent_async([](){ std::cout << "async task does not return\n"; });
// create asynchronous tasks with dynamic dependencies
tf::AsyncTask A = executor.silent_dependent_async([](){ printf("A\n"); });
tf::AsyncTask B = executor.silent_dependent_async([](){ printf("B\n"); }, A);
tf::AsyncTask C = executor.silent_dependent_async([](){ printf("C\n"); }, A);
tf::AsyncTask D = executor.silent_dependent_async([](){ printf("D\n"); }, B, C);
executor.wait_for_all();
`
Execute a Taskflow
The executor provides several thread-safe methods to run a taskflow.
You can run a taskflow once, multiple times, or until a stopping criteria is met.
These methods are non-blocking with a tf::Future return
to let you query the execution status.
`cpp
// runs the taskflow once
tf::Future run_once = executor.run(taskflow);
// wait on this run to finish
run_once.get();
// run the taskflow four times
executor.run_n(taskflow, 4);
// runs the taskflow five times
executor.run_until(taskflow, [counter=5](){ return --counter == 0; });
// block the executor until all submitted taskflows complete
executor.wait_for_all();
`
Leverage Standard Parallel Algorithms
Taskflow defines algorithms for you to quickly express common parallel
patterns using standard C++ syntaxes,
such as parallel iterations, parallel reductions, and parallel sort.
`cpp
tf::Task task1 = taskflow.for_each( // assign each element to 100 in parallel
first, last, [] (auto& i) { i = 100; }
);
tf::Task task2 = taskflow.reduce( // reduce a range of items in parallel
first, last, init, [] (auto a, auto b) { return a + b; }
);
tf::Task task3 = taskflow.sort( // sort a range of items in parallel
first, last, [] (auto a, auto b) { return a < b; }
);
`
Additionally, Taskflow provides composable graph building blocks for you to
efficiently implement common parallel algorithms, such as parallel pipeline.
`cpp
// create a pipeline to propagate five tokens through three serial stages
tf::Pipeline pl(num_parallel_lines,
tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
if(pf.token() == 5) {
pf.stop();
}
}},
tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
printf("stage 2: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
}},
tf::Pipe{tf::PipeType::SERIAL, [](tf::Pipeflow& pf) {
printf("stage 3: input buffer[%zu] = %d\n", pf.line(), buffer[pf.line()]);
}}
);
taskflow.composed_of(pl)
executor.run(taskflow).wait();
``
|
|
|
|
|
| | |