Reductions

The reductions API provides functions for reducing data from a higher rank to a lower rank.

template<typename TensorType, typename TensorIndexType, typename InType, typename ReduceOp, std::enable_if_t<is_matx_reduction_v<ReduceOp>, bool> = true>
inline void matx::reduce(TensorType dest, [[maybe_unused]] TensorIndexType idest, InType in, ReduceOp op, cudaStream_t stream = 0, bool init = true)

Perform a reduction and preserves indices

Performs a reduction from tensor “in” into values tensor “dest” and index tensor idest using reduction operation ReduceOp. The output tensor rank dictates which elements the reduction is performed over. In general, the reductions are performed over the innermost dimensions, where the number of dimensions is the difference between the input and output tensor ranks. For example, for a 0D (scalar) output tensor, the reduction is performed over the entire tensor. For anything higher, the reduction is performed across the number of ranks below the input tensor that the output tensor is. For example, if the input tensor is a 4D tensor and the output is a 1D tensor, the reduction is performed across the innermost dimension of the input. If the output is a 2D tensor, the reduction is performed across the two innermost dimensions of the input, and so on.

Template Parameters
  • TensorType – Output data type

  • TensorIndexType – Output index type

  • InType – Input data type

  • ReduceOp – Reduction operator to apply

Parameters
  • dest – Destination view of values reduced

  • idest – Destination view of indices

  • in – Input data to reduce

  • op – Reduction operator

  • stream – CUDA stream

  • init – if true dest will be initialized with ReduceOp::Init() otherwise the values in the destination will be included in the reduction.

template<typename TensorType, typename InType, typename ReduceOp>
inline void matx::reduce(TensorType &dest, const InType &in, ReduceOp op, cudaStream_t stream = 0, bool init = true)

Perform a reduction

Performs a reduction from tensor “in” into tensor “dest” using reduction operation ReduceOp. The output tensor dictates which elements the reduction is performed over. In general, the reductions are performed over the innermost dimensions, where the number of dimensions is the difference between the input and output tensor ranks. For example, for a 0D (scalar) output tensor, the reduction is performed over the entire tensor. For anything higher, the reduction is performed across the number of ranks below the input tensor that the output tensor is. For example, if the input tensor is a 4D tensor and the output is a 1D tensor, the reduction is performed across the innermost dimension of the input. If the output is a 2D tensor, the reduction is performed across the two innermost dimensions of the input, and so on.

Template Parameters
  • TensorType – Output data type

  • InType – Input data type

  • ReduceOp – Reduction operator to apply

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • op – Reduction operator

  • stream – CUDA stream

  • init – if true dest will be initialized with ReduceOp::Init() otherwise the values in the destination will be included in the reduction.

template<typename TensorType, typename InType>
inline void matx::any(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Find if any value is != 0

Returns a boolean value indicating whether any value in the set of inputs are non-zero. The same aggregation rules apply for input vs output tensor size and what type of reduction is done.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::all(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Find if all values are != 0

Returns a boolean value indicating whether all values in the set of inputs are non-zero. The same aggregation rules apply for input vs output tensor size and what type of reduction is done.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::rmin(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Compute min reduction of a tensor

Returns a vector representing the min of all numbers in the reduction

Note

This function uses the name rmin instead of min to not collide with the element-wise operator min.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::rmax(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Compute max reduction of a tensor

Returns a vector representing the max of all numbers in the reduction

Note

This function uses the name rmax instead of max to not collide with the element-wise operator max.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::sum(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Compute sum of numbers

Returns a vector representing the sum of all numbers in the reduction

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::mean(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Calculate the mean of values in a tensor

Performs a sum reduction from tensor “in” into tensor “dest” , followed by a division by the number of elements in the reduction. Similar to the reduce function, the type of reduction is dependent on the rank of the output tensor. A single value denotes a reduction over the entire input, a 1D tensor denotes a reduction over each row independently, etc.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename TensorInType>
inline void matx::median(TensorType &dest, const TensorInType &in, cudaStream_t stream = 0)

Calculate the median of values in a tensor

Calculates the median of rows in a tensor. The median is computed by sorting the data into a temporary tensor, then picking the middle element of each row. For an even number of items, the mean of the two middle elements is selected. Currently only works on tensor views as input since it uses CUB sorting as a backend, and the tensor views must be rank 2 reducing to rank 1, or rank 1 reducing to rank 0.

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • RANK_IN – Input rank

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::var(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Compute a variance reduction

Computes the variance of the input according to the output tensor rank and size

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename InType>
inline void matx::stdd(TensorType &dest, const InType &in, cudaStream_t stream = 0)

Compute a standard deviation reduction

Computes the standard deviation of the input according to the output tensor rank and size

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename TensorIndexType, typename InType>
inline void matx::argmin(TensorType &dest, TensorIndexType &idest, const InType &in, cudaStream_t stream = 0)

Compute min reduction of a tensor and returns value + index

Returns a tensor with minimums and indices

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction values

  • idest – Destination view of reduction indices

  • in – Input data to reduce

  • stream – CUDA stream

template<typename TensorType, typename TensorIndexType, typename InType>
inline void matx::argmax(TensorType &dest, TensorIndexType &idest, const InType &in, cudaStream_t stream = 0)

Compute maxn reduction of a tensor and returns value + index

Returns a tensor with maximums and indices

Template Parameters
  • T – Output data type

  • RANK – Rank of output tensor

  • InType – Input data type

Parameters
  • dest – Destination view of reduction values

  • idest – Destination view of reduction indices

  • in – Input data to reduce

  • stream – CUDA stream