3. Device Constructs supported¶
This product supports the the following OpenMP 4.0 device constructs and runtime library routines:
- #pragma omp target
- #pragma omp declare target
- #pragma omp target data
- #pragma omp target update
- omp_set_default_device()
- omp_get_default_device()
- omp_get_num_devices()
- omp_is_initial_device()
The following sections briefly introduce the supported constructs. Refer the OpenMP 4.0 specification for details.
3.1. #pragma omp target¶
The target construct is used to specify the region of code that should be offloaded for execution onto the target device. The construct also creates a device data environment by mapping host buffers to the target for the extent of the associated region. Map clauses on the target construct specify data movement from host to device before execution of the offloaded region, and device to host after execution of the offloaded region. Map types alloc, to, from and tofrom indicate data allocation and transfer direction.
float a[1024];
float b[1024];
float c[1024];
int size;
void vadd_openmp(float *a, float *b, float *c, int size)
{
#pragma omp target map(to:a[0:size],b[0:size],size) map(from: c[0:size])
{
int i;
#pragma omp parallel for
for (i = 0; i < size; i++)
c[i] = a[i] + b[i];
}
}
In the example above, variables a, b, c and size initially reside in host (ARM Linux) memory. Upon encountering a target construct:
- Space is allocated in device memory for variables a[0:size], b[0:size], c[0:size] and size.
- Any variables annotated ‘to’ are copied from host memory → device memory.
- The target region is executed on the device. Note that #pragma omp parallel for is used to distribute iterations of the for loop across the 8 DSP cores.
- Any variables annotated ‘from’ are copied from device memory → host memory.
The figure below illustrates mapping of variables from host to device memory.
To reduce the overhead of copying variables between host and target devices, refer section Reducing offload overhead.
Please note that variadic function declaration and definition is not supported within the target region. Support for variadic functions is not planned for future releases at this time.
3.2. #pragma omp declare target¶
The declare target construct specifies that variables and functions
are mapped to a device. Each function specified in a declare target
region must have definitions for both the host and target device. In the
code snippet below, function DSPF_sp_fftSPxSP
is called from within
a target region. It’s prototype must be placed in a declare target
region.
#pragma omp declare target
/* There must be a host and accelerator target definition for this function */
void DSPF_sp_fftSPxSP(int N,
float *x, float *w, float *y,
unsigned char *brev,
int n_min, int offset, int n_max);
#pragma omp end declare target
void dsplib_fft(int N, int bufsize,
float* x, float* w, float *y,
int n_min, int offset, int n_max)
{
#pragma omp target map(to: N, x[0:bufsize], w[0:bufsize], \
n_min, offset, n_max) \
map(from: y[0:bufsize])
{
DSPF_sp_fftSPxSP (N, x, w, y, 0, n_min, offset, n_max);
}
}
3.3. #pragma omp target data¶
The target data construct creates a device data environment by mapping host buffers to the target for the extent of the associated region. Target regions may be executed within a device data environment. Furthermore, mapped buffers may be re-used by multiple enclosed target regions and are valid for the entire scope of the ‘target data’ region. Target data regions may also be nested.
/* Create device buffers for a, b, c and transfer data from Host -> Device for a,b */
#pragma omp target data map(to:a[0:size], b[0:size]) map(from:c[0:size])
{
/* Existing device buffers are used and no data is transferred here */
#pragma omp target
{
int i;
#pragma omp parallel for
for (i = 0; i < size; i++)
c[i] += a[i] + b[i];
}
} /* Device -> Host data transfer of buffer c is done here*/
3.4. #pragma omp target update¶
The target update construct is used to synchronize host or device buffers within a target data region as required. In the following example, buffer c is synchronized to the host in the first target update call and to the device in the second target update call. The update direction is specified using to and from clauses.
void operate_on_host(int* buffer);
#pragma omp declare target
void operate_on_device_initial(int* srcA, int* srcB, int* result);
void operate_on_device_final(int* srcA, int* srcB, int* result);
#pragma omp end declare target
/* Create device buffers for a, b, c and transfer data from Host -> Device for a,b */
#pragma omp target data map(to:a[0:size], b[0:size]) map(from:c[0:size])
{
/* Existing device buffers are used and no data is transferred here */
#pragma omp target
{
/* Here buffer c is modified with results while operating on input buffers a,b */
operate_on_device_initial(a,b,c);
}
/* Copy modified target data to the host */
#pragma omp target update from(c[0:size])
/* Do some host side processing */
operate_on_host(c);
/* Synchronize c with target device data buffer */
#pragma omp target update to(c[0:size])
/* Offload more computation to target device */
#pragma omp target
{
operator_on_device_final(a,b,c);
}
} /* Device -> Host data transfer of buffer c is done here again*/