Description
runComputeOnCpu is a function that simulates a GPU-like compute wgironment on the CPU. It organizes work into workgroups and invocations, similar to how compute shaders operate on GPUs.
Warning:
The thread pool size must be at least MaxConcurrentWorkGroups * (ceilDiv(workgroupSizeX * workgroupSizeY * workgroupSizeZ, SubgroupSize) + 1). Compile with: -d:ThreadPoolSize=N where N meets this requirement.
Warning:
Using barrier() within conditional branches may lead to undefined behavior. The emulator is modeled using a single barrier that must be accessible from all threads within a workgroup.
Parameters
- numWorkGroups: UVec3 The number of workgroups in each dimension (x, y, z).
- workGroupSize: UVec3 The size of each workgroup in each dimension (x, y, z).
- compute: ThreadGenerator[A, B, C] The compute shader procedure to execute.
- ssbo: A Storage buffer object(s) containing the data to process.
- smem: B Shared memory for each workgroup.
- args: C Additional arguments passed to the compute shader.
Compute Function Signature
The compute shader procedure can be written in two ways:
- With shared memory:
proc computeFunction[A, B, C]( buffers: A, # Storage buffer (typically ptr T) shared: ptr B, # Shared memory for workgroup-local data args: C # Additional arguments ) {.computeShader.}
- Without shared memory:
proc computeFunction[A, C]( buffers: A, # Storage buffer (typically ptr T) args: C # Additional arguments ) {.computeShader.}
Example
type Buffers = object input, output: seq[float32] Shared = seq[float32] Args = object factor: int32 proc myComputeShader( buffers: ptr Buffers, shared: ptr Shared, args: Args) {.computeShader.} = # Computation logic here let numWorkGroups = uvec3(4, 1, 1) let workGroupSize = uvec3(256, 1, 1) var buffers: Buffers let coarseFactor = 4'i32 runComputeOnCpu( numWorkGroups, workGroupSize, myComputeShader, addr buffers, newSeq[float32](workGroupSize.x), Args(factor: coarseFactor) )
GLSL Built-in Variables
GLSL Constant | Type | Description |
---|---|---|
gl_WorkGroupID | UVec3 | ID of the current workgroup [0..gl_NumWorkGroups) |
gl_WorkGroupSize | UVec3 | Size of the workgroup (x, y, z) |
gl_NumWorkGroups | UVec3 | Total number of workgroups (x, y, z) |
gl_NumSubgroups | uint32 | Number of subgroups in the workgroup |
gl_SubgroupID | uint32 | ID of the current subgroup [0..gl_NumSubgroups) |
gl_GlobalInvocationID | UVec3 | Global ID of the current invocation [0..gl_NumWorkGroups * gl_WorkGroupSize) |
gl_LocalInvocationID | UVec3 | Local ID within the workgroup [0..gl_WorkGroupSize) |
gl_SubgroupSize | uint32 | Size of subgroups (constant across all subgroups) |
gl_SubgroupInvocationID | uint32 | ID of the invocation within the subgroup [0..gl_SubgroupSize) |
gl_SubgroupEqMask | UVec4 | Mask with bit set only at current invocation's index |
gl_SubgroupGeMask | UVec4 | Mask with bits set at and above current invocation's index |
gl_SubgroupGtMask | UVec4 | Mask with bits set above current invocation's index |
gl_SubgroupLeMask | UVec4 | Mask with bits set at and below current invocation's index |
gl_SubgroupLtMask | UVec4 | Mask with bits set below current invocation's index |
CUDA to GLSL Translation Table
CUDA Concept | GLSL Equivalent | Description |
---|---|---|
blockDim | gl_WorkGroupSize | The size of a thread block (CUDA) or work group (GLSL) |
gridDim | gl_NumWorkGroups | The size of the grid (CUDA) or the number of work groups (GLSL) |
blockIdx | gl_WorkGroupID | The index of the current block (CUDA) or work group (GLSL) |
threadIdx | gl_LocalInvocationID | The index of the current thread within its block (CUDA) or work group (GLSL) |
blockIdx * blockDim + threadIdx | gl_GlobalInvocationID | The global index of the current thread (CUDA) or invocation (GLSL) |
Types
ThreadGenerator[A; B; C] = proc (buffers: A; shared: ptr B; args: C): ThreadClosure {. nimcall.}
- Source Edit
Templates
template runComputeOnCpu(numWorkGroups, workGroupSize: UVec3; compute, ssbo, smem, args: typed)
- Source Edit
template runComputeOnCpu(numWorkGroups, workGroupSize: UVec3; compute, ssbo, args: typed)
- Source Edit