Setting Timeout Limit on OpenCL Kernels

This TI-extended OpenCL implementation supports the ability to set a timeout limit on a OpenCL kernel. When the timeout limit has been reached before the kernel execution finishes on the device, the kernel execution will be terminated on the device and a negative status will be returned to the the kernel event. After that, the device will be ready for running the next kernel.

Semantics of supported timeouts

We currently only support timeout locally on a OpenCL compute unit. That is, each compute unit has its own clock for timeout and compares against the specified timeout limit independently. With regards to multiple workgroups and multiple compute units, the following rules apply:

  1. If a kernel has multiple workgroups dispatched to the same compute unit, the clock starts when the first such workgroup starts execution, and keeps running during all workgroups. If the timeout limit is reached during the execution of any workgroups, the kernel is terminated.
  2. If timeout happens on any compute unit, kernel event status will be set to the negative value, CL_ERROR_KERNEL_TIMEOUT_TI.

If a timeout limit is set on a kernel and the timeout happens during the execution, user can query the corresponding kernel event to query the timeout error status. All subsequent kernel events that have the timed out kernel event in their wait lists will have their execution status set to CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST, per the OpenCL spec for handling kernel errors.

Warning

If user set timeout on a kernel but choose not to create a kernel event when enqueuing the kernel, the timeout will be silent to the user application, if it does happen.

OpenCL extensions and APIs

OpenCL platform extension: cl_ti_kernel_timeout

OpenCL DSP device extension: cl_ti_kernel_timeout_compute_unit

OpenCL DSP device queue property: CL_QUEUE_KERNEL_TIMEOUT_COMPUTE_UNIT_TI

OpenCL command queue property: CL_QUEUE_KERNEL_TIMEOUT_COMPUTE_UNIT_TI

OpenCL kernel event status: CL_ERROR_KERNEL_TIMEOUT_TI

OpenCL host API: cl_int __ti_set_kernel_timeout_ms(cl_kernel d_kernel, cl_uint timeout_in_ms)

Example of querying, setting and checking timeout

The timeout example illustrates how the timeout extension works. The involved steps are:

  1. Check device queue property to see if timeout extension is supported

    devices[0].getInfo(CL_DEVICE_QUEUE_PROPERTIES, &devq_prop);
    if ((devq_prop & CL_QUEUE_KERNEL_TIMEOUT_COMPUTE_UNIT_TI) != 0)
    
  2. Create a CommandQueue with timeout property

    new CommandQueue(context, devices[0],
                                   CL_QUEUE_KERNEL_TIMEOUT_COMPUTE_UNIT_TI);
    
  3. Set a timeout limit in milliseconds on the kernel

    __ti_set_kernel_timeout_ms(K(), 100);
    
  4. Create a kernel event when enqueuing the kernel

    ev = kernel_functor(...);
    //or: enqueueNDRangeKernel(kernel, Range, Range, Range, wait_evs, &ev);
    //or: enqueueTask(kernel, wait_evs, &ev);
    
  5. Check the kernel event status to see if timeout happened

    ev.getInfo(CL_EVENT_COMMAND_EXECUTION_STATUS, &status);
    if (status == CL_ERROR_KERNEL_TIMEOUT_TI)