CUB
|
Classes | |
class | cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH > |
The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
![]()
. | |
class | cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH > |
The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
![]()
. | |
Functions | |
template<typename T > | |
__device__ __forceinline__ T | cub::ShuffleUp (T input, int src_offset, int first_lane=0) |
Shuffle-up for any data type. Each warp-lanei obtains the value input contributed by warp-lanei-src_offset . For thread lanes i < src_offset, the thread's own input is returned to the thread.
![]()
. | |
template<typename T > | |
__device__ __forceinline__ T | cub::ShuffleDown (T input, int src_offset, int last_lane=CUB_PTX_WARP_THREADS-1) |
Shuffle-down for any data type. Each warp-lanei obtains the value input contributed by warp-lanei+src_offset . For thread lanes i >= WARP_THREADS, the thread's own input is returned to the thread.
![]()
. | |
template<typename T > | |
__device__ __forceinline__ T | cub::ShuffleIndex (T input, int src_lane) |
Shuffle-broadcast for any data type. Each warp-lanei obtains the value input contributed by warp-lanesrc_lane . For src_lane < 0 or src_lane >= WARP_THREADS, then the thread's own input is returned to the thread.
![]()
. | |
__device__ __forceinline__ int | cub::WarpAll (int cond) |
Portable implementation of __all. | |
__device__ __forceinline__ int | cub::WarpAny (int cond) |
Portable implementation of __any. | |
__device__ __forceinline__ T cub::ShuffleUp | ( | T | input, |
int | src_offset, | ||
int | first_lane = 0 |
||
) |
Shuffle-up for any data type. Each warp-lanei obtains the value input
contributed by warp-lanei-src_offset
. For thread lanes i < src_offset, the thread's own input
is returned to the thread.
double
value from the predecessor of its predecessor. thread_data
across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}
. The corresponding output peer_data
will be {1.0, 2.0, 1.0, 2.0, 3.0, ..., 30.0}
. [in] | input | The value to broadcast |
[in] | src_offset | The relative down-offset of the peer to read from |
[in] | first_lane | Index of first lane in segment |
Definition at line 498 of file util_ptx.cuh.
__device__ __forceinline__ T cub::ShuffleDown | ( | T | input, |
int | src_offset, | ||
int | last_lane = CUB_PTX_WARP_THREADS - 1 |
||
) |
Shuffle-down for any data type. Each warp-lanei obtains the value input
contributed by warp-lanei+src_offset
. For thread lanes i >= WARP_THREADS, the thread's own input
is returned to the thread.
double
value from the successor of its successor. thread_data
across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}
. The corresponding output peer_data
will be {3.0, 4.0, 5.0, 6.0, 7.0, ..., 32.0}
. [in] | input | The value to broadcast |
[in] | src_offset | The relative up-offset of the peer to read from |
[in] | last_lane | Index of first lane in segment |
Definition at line 559 of file util_ptx.cuh.
__device__ __forceinline__ T cub::ShuffleIndex | ( | T | input, |
int | src_lane | ||
) |
Shuffle-broadcast for any data type. Each warp-lanei obtains the value input
contributed by warp-lanesrc_lane
. For src_lane
< 0 or src_lane
>= WARP_THREADS, then the thread's own input
is returned to the thread.
double
value from warp-lane0.thread_data
across the first warp of threads is {1.0, 2.0, 3.0, 4.0, 5.0, ..., 32.0}
. The corresponding output peer_data
will be {1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0}
. [in] | input | The value to broadcast |
[in] | src_lane | Which warp lane is to do the broadcasting |
Definition at line 663 of file util_ptx.cuh.