GPULib 1.6.2 API

Tech-X Corporation

single page | use frames     summary     class     fields     routine details     file attributes

lib/

gpuexecutefunction.pro


Routines

top gpuexecutefunction

gpuexecutefunction, kernel, kernel_arg1 [, kernel_arg2] [, GRID=int/intarr] [, BLOCKS=int/intarr] [, SHARED_MEMORY=int] [, ERROR=integer]

Executes a kernel function loaded via GPULOADMODULE and GPULOADFUNCTION.

Parameters

kernel in required type=ulong64

kernel function as returned from GPULOADFUNCTION

kernel_arg1 in required type=scalar

either a GPUVariable or a scalar, as indicated by the calling signature of the kernel

kernel_arg2 in optional type=scalar

either a GPUVariable or a scalar, as indicated by the calling signature of the kernel pass kernel_arg3, kernel_arg4, etc. as needed

Keywords

GRID in optional type=int/intarr default=[1, 1, 1]

size of grid in blocks; may be a scalar or up to 3 element array

BLOCKS in optional type=int/intarr default=[1, 1, 1]

size of blocks in threads; may be a scalar or up to 3 element array

SHARED_MEMORY in optional type=int default=0

bytes of shared memory needed by kernel

ERROR out optional type=integer

error status

Examples

GPUEXECUTEFUNCTION can be used to execute a pre-compiled CUDA kernel. For example, if the following kernel was in the file vectorAdd_kernel.cu:

extern "C" __global__ void VecAdd_kernel(const float* A, const float* B, float* C, int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; }
Note that the C linkage is required for the kernel name to not be mangled in the PTX code.

Compile this with:

$ nvcc --ptx vectorAdd_kernel.cu
This produces vectorAdd_Kernel.ptx which can be loaded and executed with:
ptx_source = gpu_read_ptxfile('vectorAdd_kernel.ptx') module = gpuLoadModule(ptx_source, error=err) kernel = gpuLoadFunction(module, 'VecAdd_kernel', error=err) n = 20L dx = gpuFindgen(n) dy = gpuFindgen(n) dz = gpuFltarr(n) nThreadsPerBlock = 256L nBlocksPerGrid = (n + nThreadsPerBlock - 1L) / nThreadsPerBlock gpuExecuteFunction, kernel, $ dx, $ dy, $ dz, $ n, $ GRID=[nBlocksPerGrid], $ BLOCKS=[nThreadsPerBlock], $ ERROR=err

File attributes

Modification date: Fri Aug 30 11:27:59 2013
Lines: 9
Docformat: rst rst

GPULib and related documentation copyright 2007-2012, Tech-X Corporation, Boulder, CO. Tech-X, is a registered trademark of Tech-X Corporation. GPULib is a trademark of Tech-X Corporation. All other company, product and brand names are the property of their respective owners.