Parallel AlgorithmsΒΆ

Parallel algorithms are a basic building block for implementing advanced kernels. To simplify the implementation of the advanced kernels we provide a set of standard algorithms. These can be found in the following files:

  • include/NeoFOAM/core/parallelAlgorithms.hpp

  • test/core/parallelAlgorithms.cpp

Currently, the following algorithms are provided:

  • parallelFor

  • parallelReduce

The following code block shows the implementation of a parallelFor for fields

template<typename Executor, typename ValueType, parallelForFieldKernel<ValueType> Kernel>
void parallelFor([[maybe_unused]] const Executor& exec, Field<ValueType>& field, Kernel kernel)
{
    auto span = field.span();
    if constexpr (std::is_same<std::remove_reference_t<Executor>, SerialExecutor>::value)
    {
        for (size_t i = 0; i < field.size(); i++)
        {
            span[i] = kernel(i);
        }
    }
    else
    {
        using runOn = typename Executor::exec;
        Kokkos::parallel_for(
            "parallelFor",
            Kokkos::RangePolicy<runOn>(0, field.size()),
            KOKKOS_LAMBDA(const size_t i) { span[i] = kernel(i); }
        );
    }
}

based on the Executor type a kernel function is either run directly within a for loop or dispatched to Kokkos::parallel_for for all non SerialExecutors. The executor type determines the Kokkos::RangePolicy<runOn> and thus dispatches to GPUs if a GPUExecutor was used. Additionally, we name the kernel as "parallelFor" to improve visibility in profiling tools like nsys. Finally, a KOKKOS_LAMBDA is dispatched assigning the result of the given kernel function to the span of the field. Here the span holds data pointers to the device data and defines the begin and end pointer of the data. Several overloads of the parallelFor functions exists to simplify running parallelFor on fields and spans with and without an explicitly defined data range.

To learn more on how to use the algorithms it is recommended to check the corresponding unit test.

Further details parallelFor.

Currently, the following free functions are implemented: