src/computesim

Source   Edit  

Description

runComputeOnCpu is a function that simulates a GPU-like compute wgironment on the CPU. It organizes work into workgroups and invocations, similar to how compute shaders operate on GPUs.

Warning: The thread pool size must be at least MaxConcurrentWorkGroups * (ceilDiv(workgroupSizeX * workgroupSizeY * workgroupSizeZ, SubgroupSize) + 1). Compile with: -d:ThreadPoolSize=N where N meets this requirement.
Warning: Using barrier() within conditional branches may lead to undefined behavior. The emulator is modeled using a single barrier that must be accessible from all threads within a workgroup.

Parameters

  • numWorkGroups: UVec3 The number of workgroups in each dimension (x, y, z).
  • workGroupSize: UVec3 The size of each workgroup in each dimension (x, y, z).
  • compute: ThreadGenerator[A, B, C] The compute shader procedure to execute.
  • ssbo: A Storage buffer object(s) containing the data to process.
  • smem: B Shared memory for each workgroup.
  • args: C Additional arguments passed to the compute shader.

Compute Function Signature

The compute shader procedure can be written in two ways:

  1. With shared memory:

proc computeFunction[A, B, C](
  buffers: A,     # Storage buffer (typically ptr T)
  shared: ptr B,  # Shared memory for workgroup-local data
  args: C         # Additional arguments
) {.computeShader.}

  1. Without shared memory:

proc computeFunction[A, C](
  buffers: A,     # Storage buffer (typically ptr T)
  args: C         # Additional arguments
) {.computeShader.}

Example

type
  Buffers = object
    input, output: seq[float32]
  Shared = seq[float32]
  Args = object
    factor: int32

proc myComputeShader(
    buffers: ptr Buffers,
    shared: ptr Shared,
    args: Args) {.computeShader.} =
  # Computation logic here

let numWorkGroups = uvec3(4, 1, 1)
let workGroupSize = uvec3(256, 1, 1)
var buffers: Buffers
let coarseFactor = 4'i32

runComputeOnCpu(
  numWorkGroups, workGroupSize,
  myComputeShader,
  addr buffers,
  newSeq[float32](workGroupSize.x),
  Args(factor: coarseFactor)
)

GLSL Built-in Variables

GLSL ConstantTypeDescription
gl_WorkGroupIDUVec3ID of the current workgroup [0..gl_NumWorkGroups)
gl_WorkGroupSizeUVec3Size of the workgroup (x, y, z)
gl_NumWorkGroupsUVec3Total number of workgroups (x, y, z)
gl_NumSubgroupsuint32Number of subgroups in the workgroup
gl_SubgroupIDuint32ID of the current subgroup [0..gl_NumSubgroups)
gl_GlobalInvocationIDUVec3Global ID of the current invocation [0..gl_NumWorkGroups * gl_WorkGroupSize)
gl_LocalInvocationIDUVec3Local ID within the workgroup [0..gl_WorkGroupSize)
gl_SubgroupSizeuint32Size of subgroups (constant across all subgroups)
gl_SubgroupInvocationIDuint32ID of the invocation within the subgroup [0..gl_SubgroupSize)
gl_SubgroupEqMaskUVec4Mask with bit set only at current invocation's index
gl_SubgroupGeMaskUVec4Mask with bits set at and above current invocation's index
gl_SubgroupGtMaskUVec4Mask with bits set above current invocation's index
gl_SubgroupLeMaskUVec4Mask with bits set at and below current invocation's index
gl_SubgroupLtMaskUVec4Mask with bits set below current invocation's index

CUDA to GLSL Translation Table

CUDA ConceptGLSL EquivalentDescription
blockDimgl_WorkGroupSizeThe size of a thread block (CUDA) or work group (GLSL)
gridDimgl_NumWorkGroupsThe size of the grid (CUDA) or the number of work groups (GLSL)
blockIdxgl_WorkGroupIDThe index of the current block (CUDA) or work group (GLSL)
threadIdxgl_LocalInvocationIDThe index of the current thread within its block (CUDA) or work group (GLSL)
blockIdx * blockDim + threadIdxgl_GlobalInvocationIDThe global index of the current thread (CUDA) or invocation (GLSL)

Types

ThreadGenerator[A; B; C] = proc (buffers: A; shared: ptr B; args: C): ThreadClosure {.
    nimcall.}
Source   Edit  

Templates

template runComputeOnCpu(numWorkGroups, workGroupSize: UVec3;
                         compute, ssbo, smem, args: typed)
Source   Edit  
template runComputeOnCpu(numWorkGroups, workGroupSize: UVec3;
                         compute, ssbo, args: typed)
Source   Edit