CUB
|
Classes | |
class | cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > |
The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
![]()
. | |
class | cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > |
The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
![]()
. | |
Enumerations | |
enum | cub::CacheLoadModifier { cub::LOAD_DEFAULT, cub::LOAD_CA, cub::LOAD_CG, cub::LOAD_CS, cub::LOAD_CV, cub::LOAD_LDG, cub::LOAD_VOLATILE } |
Enumeration of cache modifiers for memory load operations. More... | |
enum | cub::CacheStoreModifier { cub::STORE_DEFAULT, cub::STORE_WB, cub::STORE_CG, cub::STORE_CS, cub::STORE_WT, cub::STORE_VOLATILE } |
Enumeration of cache modifiers for memory store operations. More... | |
Thread I/O (cache modified) | |
template<CacheLoadModifier MODIFIER, typename InputIteratorT > | |
__device__ __forceinline__ std::iterator_traits < InputIteratorT >::value_type | cub::ThreadLoad (InputIteratorT itr) |
Thread utility for reading memory using cub::CacheLoadModifier cache modifiers. Can be used to load any data type. More... | |
template<CacheStoreModifier MODIFIER, typename OutputIteratorT , typename T > | |
__device__ __forceinline__ void | cub::ThreadStore (OutputIteratorT itr, T val) |
Thread utility for writing memory using cub::CacheStoreModifier cache modifiers. Can be used to store any data type. More... | |
Blocked arrangement I/O (direct) | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectBlocked (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Load a linear segment of items into a blocked arrangement across the thread block. More... | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectBlocked (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Load a linear segment of items into a blocked arrangement across the thread block, guarded by range. More... | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectBlocked (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items, T oob_default) |
Load a linear segment of items into a blocked arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements.. More... | |
template<typename T , int ITEMS_PER_THREAD> | |
__device__ __forceinline__ void | cub::LoadDirectBlockedVectorized (int linear_tid, T *block_ptr, T(&items)[ITEMS_PER_THREAD]) |
Load a linear segment of items into a blocked arrangement across the thread block. More... | |
template<typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectBlocked (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Store a blocked arrangement of items across a thread block into a linear segment of items. More... | |
template<typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectBlocked (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Store a blocked arrangement of items across a thread block into a linear segment of items, guarded by range. More... | |
template<typename T , int ITEMS_PER_THREAD> | |
__device__ __forceinline__ void | cub::StoreDirectBlockedVectorized (int linear_tid, T *block_ptr, T(&items)[ITEMS_PER_THREAD]) |
Store a blocked arrangement of items across a thread block into a linear segment of items. More... | |
Striped arrangement I/O (direct) | |
template<int BLOCK_THREADS, typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Load a linear segment of items into a striped arrangement across the thread block. More... | |
template<int BLOCK_THREADS, typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Load a linear segment of items into a striped arrangement across the thread block, guarded by range. More... | |
template<int BLOCK_THREADS, typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items, T oob_default) |
Load a linear segment of items into a striped arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements. More... | |
template<int BLOCK_THREADS, typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectStriped (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Store a striped arrangement of data across the thread block into a linear segment of items. More... | |
template<int BLOCK_THREADS, typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectStriped (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Store a striped arrangement of data across the thread block into a linear segment of items, guarded by range. More... | |
Warp-striped arrangement I/O (direct) | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectWarpStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Load a linear segment of items into a warp-striped arrangement across the thread block. More... | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectWarpStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Load a linear segment of items into a warp-striped arrangement across the thread block, guarded by range. More... | |
template<typename T , int ITEMS_PER_THREAD, typename InputIteratorT > | |
__device__ __forceinline__ void | cub::LoadDirectWarpStriped (int linear_tid, InputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items, T oob_default) |
Load a linear segment of items into a warp-striped arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements. More... | |
template<typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectWarpStriped (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD]) |
Store a warp-striped arrangement of data across the thread block into a linear segment of items. More... | |
template<typename T , int ITEMS_PER_THREAD, typename OutputIteratorT > | |
__device__ __forceinline__ void | cub::StoreDirectWarpStriped (int linear_tid, OutputIteratorT block_itr, T(&items)[ITEMS_PER_THREAD], int valid_items) |
Store a warp-striped arrangement of data across the thread block into a linear segment of items, guarded by range. More... | |
Enumeration of cache modifiers for memory load operations.
Definition at line 62 of file thread_load.cuh.
Enumeration of cache modifiers for memory store operations.
Definition at line 61 of file thread_store.cuh.
__device__ __forceinline__ std::iterator_traits<InputIteratorT>::value_type cub::ThreadLoad | ( | InputIteratorT | itr) |
Thread utility for reading memory using cub::CacheLoadModifier cache modifiers. Can be used to load any data type.
MODIFIER | [inferred] CacheLoadModifier enumeration |
InputIteratorT | [inferred] Input iterator type (may be a simple pointer type) |
__device__ __forceinline__ void cub::ThreadStore | ( | OutputIteratorT | itr, |
T | val | ||
) |
Thread utility for writing memory using cub::CacheStoreModifier cache modifiers. Can be used to store any data type.
MODIFIER | [inferred] CacheStoreModifier enumeration |
InputIteratorT | [inferred] Output iterator type (may be a simple pointer type) |
T | [inferred] Data type of output value |
__device__ __forceinline__ void cub::LoadDirectBlocked | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Load a linear segment of items into a blocked arrangement across the thread block.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
Definition at line 76 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectBlocked | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Load a linear segment of items into a blocked arrangement across the thread block, guarded by range.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
Definition at line 104 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectBlocked | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items, | ||
T | oob_default | ||
) |
Load a linear segment of items into a blocked arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements..
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
[in] | oob_default | Default value to assign out-of-bound items |
Definition at line 134 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectBlockedVectorized | ( | int | linear_tid, |
T * | block_ptr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Load a linear segment of items into a blocked arrangement across the thread block.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
The input offset (block_ptr
+ block_offset
) must be quad-item aligned
The following conditions will prevent vectorization and loading will fall back to cub::BLOCK_LOAD_DIRECT:
ITEMS_PER_THREAD
is oddT
is not a built-in primitive or CUDA vector type (e.g., short
, int2
, double
, float2
, etc.)T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_ptr | Input pointer for loading from |
[out] | items | Data to load |
Definition at line 228 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Load a linear segment of items into a striped arrangement across the thread block.
Assumes a striped arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns items (i), (i + block-threads), ..., (i + (block-threads*(items-per-thread-1))). For multi-dimensional thread blocks, a row-major thread ordering is assumed.
BLOCK_THREADS | The thread block size in threads |
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
Definition at line 286 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Load a linear segment of items into a striped arrangement across the thread block, guarded by range.
Assumes a striped arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns items (i), (i + block-threads), ..., (i + (block-threads*(items-per-thread-1))). For multi-dimensional thread blocks, a row-major thread ordering is assumed.
BLOCK_THREADS | The thread block size in threads |
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
Definition at line 317 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items, | ||
T | oob_default | ||
) |
Load a linear segment of items into a striped arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements.
Assumes a striped arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns items (i), (i + block-threads), ..., (i + (block-threads*(items-per-thread-1))). For multi-dimensional thread blocks, a row-major thread ordering is assumed.
BLOCK_THREADS | The thread block size in threads |
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
[in] | oob_default | Default value to assign out-of-bound items |
Definition at line 349 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectWarpStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Load a linear segment of items into a warp-striped arrangement across the thread block.
Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads*items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), ..., (i + (warp-threads*(items-per-thread-1))).
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
Definition at line 392 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectWarpStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Load a linear segment of items into a warp-striped arrangement across the thread block, guarded by range.
Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads*items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), ..., (i + (warp-threads*(items-per-thread-1))).
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
Definition at line 427 of file block_load.cuh.
__device__ __forceinline__ void cub::LoadDirectWarpStriped | ( | int | linear_tid, |
InputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items, | ||
T | oob_default | ||
) |
Load a linear segment of items into a warp-striped arrangement across the thread block, guarded by range, with a fall-back assignment of out-of-bound elements.
Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads*items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), ..., (i + (warp-threads*(items-per-thread-1))).
T | [inferred] The data type to load. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
InputIteratorT | [inferred] The random-access iterator type for input (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base input iterator for loading from |
[out] | items | Data to load |
[in] | valid_items | Number of valid items to load |
[in] | oob_default | Default value to assign out-of-bound items |
Definition at line 465 of file block_load.cuh.
__device__ __forceinline__ void cub::StoreDirectBlocked | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Store a blocked arrangement of items across a thread block into a linear segment of items.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[in] | items | Data to store |
Definition at line 74 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectBlocked | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Store a blocked arrangement of items across a thread block into a linear segment of items, guarded by range.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[in] | items | Data to store |
[in] | valid_items | Number of valid items to write |
Definition at line 101 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectBlockedVectorized | ( | int | linear_tid, |
T * | block_ptr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Store a blocked arrangement of items across a thread block into a linear segment of items.
Assumes a blocked arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns the ith range of items-per-thread contiguous items. For multi-dimensional thread blocks, a row-major thread ordering is assumed.
The output offset (block_ptr
+ block_offset
) must be quad-item aligned, which is the default starting offset returned by cudaMalloc()
ITEMS_PER_THREAD
is oddT
is not a built-in primitive or CUDA vector type (e.g., short
, int2
, double
, float2
, etc.)T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_ptr | Input pointer for storing from |
[in] | items | Data to store |
Definition at line 139 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectStriped | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Store a striped arrangement of data across the thread block into a linear segment of items.
Assumes a striped arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns items (i), (i + block-threads), ..., (i + (block-threads*(items-per-thread-1))). For multi-dimensional thread blocks, a row-major thread ordering is assumed.
BLOCK_THREADS | The thread block size in threads |
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[in] | items | Data to store |
Definition at line 202 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectStriped | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Store a striped arrangement of data across the thread block into a linear segment of items, guarded by range.
Assumes a striped arrangement of (block-threads*items-per-thread) items across the thread block, where threadi owns items (i), (i + block-threads), ..., (i + (block-threads*(items-per-thread-1))). For multi-dimensional thread blocks, a row-major thread ordering is assumed.
BLOCK_THREADS | The thread block size in threads |
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[in] | items | Data to store |
[in] | valid_items | Number of valid items to write |
Definition at line 231 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectWarpStriped | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD] | ||
) |
Store a warp-striped arrangement of data across the thread block into a linear segment of items.
Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads*items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), ..., (i + (warp-threads*(items-per-thread-1))).
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[out] | items | Data to load |
Definition at line 273 of file block_store.cuh.
__device__ __forceinline__ void cub::StoreDirectWarpStriped | ( | int | linear_tid, |
OutputIteratorT | block_itr, | ||
T(&) | items[ITEMS_PER_THREAD], | ||
int | valid_items | ||
) |
Store a warp-striped arrangement of data across the thread block into a linear segment of items, guarded by range.
Assumes a warp-striped arrangement of elements across threads, where warpi owns the ith range of (warp-threads*items-per-thread) contiguous items, and each thread owns items (i), (i + warp-threads), ..., (i + (warp-threads*(items-per-thread-1))).
T | [inferred] The data type to store. |
ITEMS_PER_THREAD | [inferred] The number of consecutive items partitioned onto each thread. |
OutputIteratorT | [inferred] The random-access iterator type for output (may be a simple pointer type). |
[in] | linear_tid | A suitable 1D thread-identifier for the calling thread (e.g., (threadIdx.y * blockDim.x) + linear_tid for 2D thread blocks) |
[in] | block_itr | The thread block's base output iterator for storing to |
[in] | items | Data to store |
[in] | valid_items | Number of valid items to write |
Definition at line 307 of file block_store.cuh.