 cub | Optional outer namespace(s) |
  CachingDeviceAllocator | A simple caching allocator for device memory allocations |
  If | Type selection (IF ? ThenType : ElseType ) |
  Equals | Type equality test |
  Log2 | Statically determine log2(N), rounded up |
  PowerOfTwo | Statically determine if N is a power-of-two |
  IsPointer | Pointer vs. iterator |
  IsVolatile | Volatile modifier test |
  RemoveQualifiers | Removes const and volatile qualifiers from type Tp |
  ArgIndexInputIterator | A random-access input wrapper for pairing dereferenced values with their corresponding indices (forming KeyValuePair tuples) |
  CacheModifiedInputIterator | A random-access input wrapper for dereferencing array values using a PTX cache load modifier |
  CacheModifiedOutputIterator | A random-access output wrapper for storing array values using a PTX cache-modifier |
  ConstantInputIterator | A random-access input generator for dereferencing a sequence of homogeneous values |
  CountingInputIterator | A random-access input generator for dereferencing a sequence of incrementing integer values |
  TexObjInputIterator | A random-access input wrapper for dereferencing array values through texture cache. Uses newer Kepler-style texture objects |
  TexRefInputIterator | A random-access input wrapper for dereferencing array values through texture cache. Uses older Tesla/Fermi-style texture references |
  TransformInputIterator | A random-access input wrapper for transforming dereferenced values |
  Equality | Default equality functor |
  Inequality | Default inequality functor |
  InequalityWrapper | Inequality functor (wraps equality functor) |
  Sum | Default sum functor |
  Max | Default max functor |
  ArgMax | Arg max functor (keeps the value and offset of the first occurrence of the larger item) |
  Min | Default min functor |
  ArgMin | Arg min functor (keeps the value and offset of the first occurrence of the smallest item) |
  Cast | Default cast functor |
  SwizzleScanOp | Binary operator wrapper for switching non-commutative scan arguments |
  ReduceBySegmentOp | Reduce-by-segment functor |
  ReduceByKeyOp | < Binary reduction operator to apply to values |
  BlockDiscontinuity | The BlockDiscontinuity class provides collective methods for flagging discontinuities within an ordered set of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockDiscontinuity require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockExchange | The BlockExchange class provides collective methods for rearranging data partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockExchange require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockHistogram | The BlockHistogram class provides collective methods for constructing block-wide histograms from data samples partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockHistogram require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockLoad | The BlockLoad class provides collective data movement methods for loading a linear segment of items from memory into a blocked arrangement across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockLoad require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockRadixSort | The BlockRadixSort class provides collective methods for sorting items partitioned across a CUDA thread block using a radix sorting method.
|
   TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockReduce | The BlockReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockScan | The BlockScan class provides collective methods for computing a parallel prefix sum/scan of items partitioned across a CUDA thread block.
|
   TempStorage | The operations exposed by BlockScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  BlockStore | The BlockStore class provides collective data movement methods for writing a blocked arrangement of items partitioned across a CUDA thread block to a linear segment of memory.
|
   TempStorage | The operations exposed by BlockStore require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  DeviceHistogram | DeviceHistogram provides device-wide parallel operations for constructing histogram(s) from a sequence of samples data residing within device-accessible memory.
|
  DevicePartition | DevicePartition provides device-wide, parallel operations for partitioning sequences of data items residing within device-accessible memory.
|
  DeviceRadixSort | DeviceRadixSort provides device-wide, parallel operations for computing a radix sort across a sequence of data items residing within device-accessible memory.
|
  DeviceReduce | DeviceReduce provides device-wide, parallel operations for computing a reduction across a sequence of data items residing within device-accessible memory.
|
  DeviceRunLengthEncode | DeviceRunLengthEncode provides device-wide, parallel operations for demarcating "runs" of same-valued items within a sequence residing within device-accessible memory.
|
  DeviceScan | DeviceScan provides device-wide, parallel operations for computing a prefix scan across a sequence of data items residing within device-accessible memory.
|
  DeviceSegmentedRadixSort | DeviceSegmentedRadixSort provides device-wide, parallel operations for computing a batched radix sort across multiple, non-overlapping sequences of data items residing within device-accessible memory.
|
  DeviceSegmentedReduce | DeviceSegmentedReduce provides device-wide, parallel operations for computing a reduction across multiple sequences of data items residing within device-accessible memory.
|
  DeviceSelect | DeviceSelect provides device-wide, parallel operations for compacting selected items from sequences of data items residing within device-accessible memory.
|
  DeviceSpmv | DeviceSpmv provides device-wide parallel operations for performing sparse-matrix * dense-vector multiplication (SpMV) |
  WarpScan | The WarpScan class provides collective methods for computing a parallel prefix scan of items partitioned across a CUDA thread warp.
|
   TempStorage | The operations exposed by WarpScan require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |
  WarpReduce | The WarpReduce class provides collective methods for computing a parallel reduction of items partitioned across a CUDA thread warp.
|
   TempStorage | The operations exposed by WarpReduce require a temporary memory allocation of this nested type for thread communication. This opaque storage can be allocated directly using the __shared__ keyword. Alternatively, it can be aliased to externally allocated memory (shared or global) or union 'd with other storage allocation types to facilitate memory reuse |