File buffer.h¶
This file contains the interface definition for the backends.
For normal use you should not call the functions defined in this file directly.
- See
- array.h For managing buffers
- See
- kernel.h For using kernels
Defines
-
GA_CTX_SCHED_AUTO
¶ Automatic scheduling, decide what to do depending on the workload, number of cores in the computer and other relevant factors. (default)
-
GA_CTX_SCHED_SINGLE
¶ Single-work scheduling. Optimize for speed in a single process, with a single thread. This is the fastest mode, but it may keep the CPU busy more than necessary.
-
GA_CTX_SCHED_MULTI
¶ Multi-work scheduling. Try to not keep the CPU busy more than necessary and let other threads a chance at some CPU time. This may increase the latency when waiting for GPU operations.
-
GA_BUFFER_READ_WRITE
¶ The buffer is available for reading and writing from kernels.
This is the default (0) value.
-
GA_BUFFER_DEV
¶ Allocate the buffer in device-only memory.
This is the default (0) value.
-
GA_BUFFER_READ_ONLY
¶ Signal that the memory in this buffer will only be read by kernels.
You can use gpudata_write() to set the contents.
You may not call gpudata_memset() with the resulting buffer as the destination.
-
GA_BUFFER_WRITE_ONLY
¶ Signal that the memory in this buffer will only be written by kernels (i.e. it is an output buffer).
You can read the contents with gpudata_read().
-
GA_BUFFER_INIT
¶ Initialize the contents of the buffer with the user-supplied host buffer (
data
). This buffer must be at leastsz
large.
-
GA_BUFFER_HOST
¶ Allocate the buffer in host-reachable memory enabling you to retrieve a pointer to the contents as the
GA_BUFFER_PROP_HOSTPOINTER
property.
-
GA_BUFFER_MASK
¶
-
GA_CTX_PROP_DEVNAME
¶ Get the device name for the context.
Type:
char [256]
-
GA_CTX_PROP_LMEMSIZE
¶ Get the local memory size available for a call in the context.
Type:
size_t
-
GA_CTX_PROP_NUMPROCS
¶ Number of compute units in this context.
compute units times local size is more or less the expected parallelism available on the device, but this is a very rough estimate.
Type:
unsigned int
-
GA_CTX_PROP_BIN_ID
¶ Get the compatibility ID for the binaries generated with this context.
Those binaries should work with any context which has the same ID.
Type:
const char *
-
GA_CTX_PROP_ERRBUF
¶ Get a pre-allocated 8 byte buffer for kernel ops.
This buffer is initialized to 0 on allocation and must always be returned to that state after using it.
This only to avoid the overhead of an allocation when calling a kernel that may error out. It does not preclude the need for synchronization and transfers.
Type:
gpudata *
-
GA_CTX_PROP_TOTAL_GMEM
¶ Get the total size of global memory on the device.
Type:
size_t
-
GA_CTX_PROP_FREE_GMEM
¶ Get the size of free global memory on the device.
Type:
size_t
-
GA_CTX_PROP_NATIVE_FLOAT16
¶ Get the status of native float16 support on the device.
Type:
int
-
GA_CTX_PROP_MAXGSIZE0
¶ Get the maximum global size for dimension 0.
Type:
size_t
-
GA_CTX_PROP_MAXGSIZE1
¶ Get the maximum global size for dimension 1.
Type:
size_t
-
GA_CTX_PROP_MAXGSIZE2
¶ Get the maximum global size for dimension 2.
Type:
size_t
-
GA_CTX_PROP_MAXLSIZE0
¶ Get the maximum local size for dimension 0.
Type:
size_t
-
GA_CTX_PROP_MAXLSIZE1
¶ Get the maximum local size for dimension 1.
Type:
size_t
-
GA_CTX_PROP_MAXLSIZE2
¶ Get the maximum loca size for dimension 2.
Type:
size_t
-
GA_CTX_PROP_UNIQUE_ID
¶ Get a unique ID for the device behind the context.
Type:
char [16]
-
GA_CTX_PROP_LARGEST_MEMBLOCK
¶ Get the largest single block of memory that can be allocted.
Type:
size_t
-
GA_BUFFER_PROP_START
¶
-
GA_BUFFER_PROP_CTX
¶ Get the context in which this buffer was allocated.
Type:
gpucontext *
-
GA_BUFFER_PROP_REFCNT
¶ The reference count of the buffer. Use only for debugging purposes.
Type:
unsigned int
-
GA_BUFFER_PROP_SIZE
¶ Size of the buffer on the device.
This may be larger than the requested allocation size due to a number of factors.
Type:
size_t
-
GA_KERNEL_PROP_START
¶
-
GA_KERNEL_PROP_CTX
¶ Get the context for which this kernel was compiled.
Type:
gpucontext *
-
GA_KERNEL_PROP_MAXLSIZE
¶ Get the maximum block size (also known as local size) for a call of this kernel.
Type:
size_t
-
GA_KERNEL_PROP_PREFLSIZE
¶ Get the prefered multiple of the block size for a call to this kernel.
Type:
size_t
-
GA_KERNEL_PROP_NUMARGS
¶ Get the number of kernel arguments.
Type
unsigned int
-
GA_KERNEL_PROP_TYPES
¶ Get the list of argument types for a kernel.
This list is the same length as the number of arguments to the kernel. Do not modify the returned list.
Type:
const int *
Typedefs
-
typedef struct _gpucontext
gpucontext
¶ Opaque struct for context data.
-
typedef struct _gpucontext_props
gpucontext_props
¶ Opaque structure that holds properties for the context.
Enums
-
ga_usefl
¶ Flags for gpukernel_init().
It is important to specify these properly as the compilation machinery will ensure that the proper configuration is made to support the requested features or error out if the demands cannot be met.
- Warning
- Failure to properly specify the feature flags will in most cases result in silent data corruption (especially on ATI cards).
Values:
-
0x02
¶ The kernel makes use of small (size is smaller than 4 bytes) types.
-
0x04
¶ The kernel makes use of double or complex doubles.
-
0x08
¶ The kernel makes use of complex of complex doubles.
-
0x10
¶ The kernel makes use of half-floats (also known as float16)
-
0x2000
¶ The kernel is made of CUDA code.
-
0x4000
¶ The kernel is made of OpenCL code.
Functions
-
int
gpu_get_platform_count
(const char * name, unsigned int * platcount)¶ Gets information about the number of available platforms for the backend specified in
name
.- Return
- GA_NO_ERROR, if success
- Parameters
name
: the backend nameplatcount
: will contain number of compatible platforms in host
-
int
gpu_get_device_count
(const char * name, unsigned int platform, unsigned int * devcount)¶ Gets information about the number of compatible devices on a specific host’s
platform
for the backend specified inname
.- Return
- GA_NO_ERROR, if success
- Parameters
name
: the backend nameplatform
: number for a platform in hostdevcount
: will contain number of compatible devices inplatform
-
int
gpucontext_props_new
(gpucontext_props ** res)¶ Allocate and initialized an instance of gpucontext_props.
Initialization is done with default values.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
res
: pointer to storage space for the created object
-
int
gpucontext_props_cuda_dev
(gpucontext_props * p, int devno)¶ Set the device number for a CUDA device.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties objectdevno
: device number
-
int
gpucontext_props_opencl_dev
(gpucontext_props * p, int platno, int devno)¶ Set the platform and device for OpenCL.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties objectplatno
: platform numberdevno
: device number
-
int
gpucontext_props_sched
(gpucontext_props * p, int sched)¶ Set the scheduling mode for the device.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties objectsched
: scheduling mode. One of these.
-
int
gpucontext_props_set_single_stream
(gpucontext_props * p)¶ Set single-stream mode.
All operations on the device will be serialized on a single stream. This will also disable most of the interlocking normally done between multiple streams to keep everything in order.
This mode can be faster if you don’t have a lot of device-level parallelism in your workload.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties object
-
int
gpucontext_props_kernel_cache
(gpucontext_props * p, const char * path)¶ Set the path for the kernel cache.
The cache can be shared with other running instances, even on shared drives.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties objectpath
: desired location of the kernel cache
-
int
gpucontext_props_alloc_cache
(gpucontext_props * p, size_t initial, size_t max)¶ Configure the allocation cache.
The maximum size is also a limit on the total amount of memory allocated on the device.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties objectinitial
: initial size of the cachemax
: maximum size of the cache
-
void
gpucontext_props_del
(gpucontext_props * p)¶ Free a properties object.
This should not be called on a properties object that has been passed to gpucontext_init().
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
p
: properties object
-
int
gpucontext_init
(gpucontext ** res, const char * name, gpucontext_props * props)¶ Create a context on the specified device.
The passed-in properties pointer will be managed by this function and needs not be freed. This means that you shouldn’t touch the properties object after passing it to this function.
- Warning
- This function is not thread-safe.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
res
: a pointer to a location that will be allocatedname
: the backend name.props
: a properties object for the context. Can be NULL for defaults.
-
void
gpucontext_deref
(gpucontext * ctx)¶ Dereference a context.
This removes a reference to the context and as soon as the reference count drops to zero the context is destroyed. The context can stay alive after you call this function because some object keep a reference to their context.
- Parameters
ctx
: a valid context pointer.
-
int
gpucontext_property
(gpucontext * ctx, int prop_id, void * res)¶ Fetch a context property.
The property must be a context property. The currently defined properties and their type are defined in Properties.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
ctx
: contextprop_id
: property id (from Properties)res
: pointer to the return space of the appropriate type
-
const char*
gpucontext_error
(gpucontext * ctx, int err)¶ Get a string describing
err
.If you need to get a description of a error that occurred during context creation, call this function using NULL as the context. This version of the call is not thread-safe.
- Return
- string description of error
- Parameters
ctx
: the context in which the error occurederr
: error code
-
gpudata*
gpudata_alloc
(gpucontext * ctx, size_t sz, void * data, int flags, int * ret)¶ Allocates a buffer of size
sz
in contextctx
.Buffers are reference counted internally and start with a reference count of 1.
- Return
- A non-NULL pointer to a gpudata structure. This structure is intentionally opaque as its content may change according to the backend used.
- Parameters
ctx
: a context pointersz
: the requested sizeflags
: see Allocation flagsdata
: optional pointer to host bufferret
: error return pointer
-
void
gpudata_retain
(gpudata * b)¶ Increase the reference count to the passed buffer by 1.
- Parameters
b
: a buffer
-
void
gpudata_release
(gpudata * b)¶ Release a buffer.
This will decrement the reference count of the buffer by 1. If that count reaches 0 all associated ressources will be released.
Even if your application does not have any references left to a buffer it may still hang around if it is in use by internal mechanisms (kernel call, …)
Check if two buffers may overlap.
Both buffers must have been created with the same backend.
- Parameters
a
: first bufferb
: second bufferret
: error return pointer
- Return Value
1
: The buffers may overlap0
: The buffers do not overlap.-1
: An error was encoutered,ret
contains a detailed error code if not NULL.
-
int
gpudata_move
(gpudata * dst, size_t dstoff, gpudata * src, size_t srcoff, size_t sz)¶ Copy the content of a buffer to another.
Both buffers must be in the same context and contiguous. Additionally the buffers must not overlap otherwise the content of the destination buffer is not defined.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
dst
: destination bufferdstoff
: offset inside the destination buffersrc
: source buffersrcoff
: offset inside the source buffersz
: size of data to copy (in bytes)
-
int
gpudata_transfer
(gpudata * dst, size_t dstoff, gpudata * src, size_t srcoff, size_t sz)¶ Transfer the content of buffer across contexts.
If possible it will try to the the transfer in an efficient way using backend-specific tricks. If those fail or can’t be used, it will fallback to a copy through the host.
- Return
- the new buffer in dst_ctx or NULL if no efficient way to transfer could be found.
- Parameters
dst
: buffer to transfer todstoff
: offset in the destination buffersrc
: buffer to transfer fromsrcoff
: offset in the source buffersz
: size of the region to transfer
-
int
gpudata_read
(void * dst, gpudata * src, size_t srcoff, size_t sz)¶ Transfer data from a buffer to memory.
The buffer and the memory region must be contiguous.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
dst
: destination in memorysrc
: source buffersrcoff
: offset inside the source buffersz
: size of data to copy (in bytes)
-
int
gpudata_write
(gpudata * dst, size_t dstoff, const void * src, size_t sz)¶ Transfer data from memory to a buffer.
The buffer and the memory region must be contiguous.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
dst
: destination bufferdstoff
: offset inside the destination buffersrc
: source in memorysz
: size of data to copy (in bytes)
-
int
gpudata_memset
(gpudata * dst, size_t dstoff, int data)¶ Set a buffer to a byte pattern.
This function acts like the C function memset() for device buffers.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
dst
: destination bufferdstoff
: offset into the destination bufferdata
: byte value to write into the destination.
-
int
gpudata_sync
(gpudata * b)¶ Synchronize a buffer.
Waits for all previous read, writes, copies and kernel calls involving this buffer to be finished.
This call is not required for normal use of the library as all exposed operations will properly synchronize amongst themselves. This call may be useful in a performance timing context to ensure that the work is really done, or before interaction with another library to wait for pending operations.
-
int
gpudata_property
(gpudata * buf, int prop_id, void * res)¶ Fetch a buffer property.
Can be used for buffer properties and context properties. Context properties will fetch the value for the context associated with the buffer. The currently defined properties and their type are defined in Properties.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
buf
: bufferprop_id
: property id (from Properties)res
: pointer to the return space of the appropriate type
-
gpucontext*
gpudata_context
(gpudata * b)¶
-
gpukernel*
gpukernel_init
(gpucontext * ctx, unsigned int count, const char ** strings, const size_t * lengths, const char * fname, unsigned int numargs, const int * typecodes, int flags, int * ret, char ** err_str)¶ Compile a kernel.
Compile the kernel composed of the concatenated strings in
strings
and return a callable kernel. If lengths is NULL then all the strings must be NUL-terminated. Otherwise, it doesn’t matter (but the lengths must not include the final NUL byte if provided).If
*err_str
is not NULL on return, the caller must callfree(*err_str)
after use.- Parameters
ctx
: context to work incount
: number of input stringsstrings
: table of string pointerslengths
: (optional) length for each string in the tablefname
: name of the kernel function (as defined in the code)numargs
: number of kernel argumentstypecodes
: the type of each argumentflags
: flags for compilation (see ga_usefl)ret
: error return pointererr_str
: returns pointer to debug message from GPU backend (if provided a non-NULL err_str)
- Return
- Allocated kernel structure or NULL if an error occured.
ret
will be updated with the error code if not NULL.
-
void
gpukernel_retain
(gpukernel * k)¶ Retain a kernel.
Increase the reference count of the passed kernel by 1.
- Parameters
k
: a kernel
-
void
gpukernel_release
(gpukernel * k)¶ Release a kernel.
Decrease the reference count of a kernel. If it reaches 0, all resources associated with
k
will be released.If the reference count of a kernel reaches 0 while it is running, this call will block until completion.
-
int
gpukernel_setarg
(gpukernel * k, unsigned int i, void * a)¶ Set kernel argument.
Buffer arguments will not be retained and it is the responsability of the caller to ensure that the value is still valid whenever a call is made.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
k
: kerneli
: argument index (starting at 0)a
: pointer to argument
-
int
gpukernel_call
(gpukernel * k, unsigned int n, const size_t * gs, const size_t * ls, size_t shared, void ** args)¶ Call a kernel.
If args is NULL, it will be assumed that the arguments have previously been set with kernel_setarg().
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
k
: kerneln
: number of dimensions of grid/blockgs
: grid sizes for this call (also known as global size)ls
: block sizes for this call (also known as local size)shared
: amount of dynamic shared memory to reserveargs
: table of pointers to each argument (optional).
-
int
gpukernel_property
(gpukernel * k, int prop_id, void * res)¶ Fetch a property.
Can be used for kernel and context properties. The context properties will fetch the value for the context associated with the kernel. The currently defined properties and their type are defined in Properties.
- Return
- GA_NO_ERROR or an error code if an error occurred.
- Parameters
k
: kernelprop_id
: property id (from Properties)res
: pointer to the return space of the appropriate type
-
gpucontext*
gpukernel_context
(gpukernel * k)¶