 cub::ArgIndexInputIterator< InputIteratorT, OffsetT > | A random-access input wrapper for pairing dereferenced values with their corresponding indices (forming KeyValuePair tuples) |
 cub::ArgMax | Arg max functor (keeps the value and offset of the first occurrence of the larger item) |
 cub::ArgMin | Arg min functor (keeps the value and offset of the first occurrence of the smallest item) |
 cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.
|
 cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.
|
 cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.
|
 cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
|
 cub::BlockRadixSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.
|
 cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.
|
 cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.
|
 cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
|
 cub::CacheModifiedInputIterator< MODIFIER, ValueType, OffsetT > | A random-access input wrapper for dereferencing array values using a PTX cache load modifier |
 cub::CacheModifiedOutputIterator< MODIFIER, ValueType, OffsetT > | A random-access output wrapper for storing array values using a PTX cache-modifier |
 cub::CachingDeviceAllocator | A simple caching allocator for device memory allocations |
 cub::Cast< B > | Default cast functor |
 cub::ConstantInputIterator< ValueType, OffsetT > | A random-access input generator for dereferencing a sequence of homogeneous values |
 cub::CountingInputIterator< ValueType, OffsetT > | A random-access input generator for dereferencing a sequence of incrementing integer values |
 cub::DeviceHistogram | DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
|
 cub::DevicePartition | DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within device-accessible memory.
|
 cub::DeviceRadixSort | DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.
|
 cub::DeviceReduce | DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within device-accessible memory.
|
 cub::DeviceRunLengthEncode | DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.
|
 cub::DeviceScan | DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.
|
 cub::DeviceSegmentedRadixSort | DeviceSegmentedRadixSort provides device-wide, parallel operations for computing a batched radix sort across multiple, non-overlapping sequences of data items residing within device-accessible memory.
|
 cub::DeviceSegmentedReduce | DeviceSegmentedReduce provides device-wide, parallel operations for computing a reduction across multiple sequences of data items residing within device-accessible memory.
|
 cub::DeviceSelect | DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.
|
 cub::DeviceSpmv | DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV) |
 cub::Equality | Default equality functor |
 cub::Equals< A, B > | Type equality test |
 cub::If< IF, ThenType, ElseType > | Type selection (IF ? ThenType : ElseType ) |
 cub::Inequality | Default inequality functor |
 cub::InequalityWrapper< EqualityOp > | Inequality functor (wraps equality functor) |
 cub::IsPointer< Tp > | Pointer vs. iterator |
 cub::IsVolatile< Tp > | Volatile modifier test |
 cub::Log2< N, CURRENT_VAL, COUNT > | Statically determine log2(N), rounded up |
 cub::Max | Default max functor |
 cub::Min | Default min functor |
 cub::PowerOfTwo< N > | Statically determine if N is a power-of-two |
 cub::ReduceByKeyOp< ReductionOpT > | < Binary reduction operator to apply to values |
 cub::ReduceBySegmentOp< ReductionOpT > | Reduce-by-segment functor |
 cub::RemoveQualifiers< Tp, Up > | Removes const and volatile qualifiers from type Tp |
 cub::Sum | Default sum functor |
 cub::SwizzleScanOp< ScanOp > | Binary operator wrapper for switching non-commutative scan arguments |
 cub::TexObjInputIterator< T, OffsetT > | A random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects |
 cub::TexRefInputIterator< T, UNIQUE_ID, OffsetT > | A random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references |
 cub::TransformInputIterator< ValueType, ConversionOp, InputIteratorT, OffsetT > | A random-access input wrapper for transforming dereferenced values |
 Uninitialized | |
  cub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockExchange< T, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::LoadInternal< BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockLoad< InputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockRadixSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_WARP_TRANSPOSE, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::StoreInternal< BLOCK_STORE_WARP_TRANSPOSE_TIMESLICED, DUMMY >::TempStorage | Alias wrapper allowing storage to be unioned |
  cub::BlockStore< OutputIteratorT, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH >::TempStorage | The operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH >::TempStorage | The operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
 cub::WarpReduce< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
|
 cub::WarpScan< T, LOGICAL_WARP_THREADS, PTX_ARCH > | The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
|