TI OpenCL v01.02.xx User’s Guide¶
Note
If you landed on this page via https://software-dl.ti.com/mctools/esd/docs/opencl (without the trailing ‘/’), reload the page by clicking https://software-dl.ti.com/mctools/esd/docs/opencl/index.html to fix relative links in the OpenCL html documentation.
TI OpenCL™ Runtime Documentation Contents:
- Introduction
- Reference Material
- Offloading using OpenCL
- Compilation
- Memory Usage
- Device Memory
- How DDR3 is Partitioned for Linux System and OpenCL
- The OpenCL Memory Model
- OpenCL Buffers
- Alternate Host malloc/free Extension for Zero Copy OpenCL Kernels
- Buffer Read/Write vs. Map/Unmap
- Discovering OpenCL Memory Sizes and Limits
- Cache Operations
- Large OpenCL buffers and Memory Beyond the 32-bit DSP Address Space
- User Defined DSP Heap Extension
- Execution Model
- Extensions
- Calling Standard C Code From OpenCL C Code
- Calling Standard C code with OpenMP from OpenCL C code
- C66x standard C compiler intrinsic functions
- OpenCL C code using printf
- DMA Control Using EdmaMgr Functions
- Using Extended Memory on the 66AK2x device
- Fast Global buffers in on-chip MSMC memory
- OpenCL C Builtin Function Extensions
- Cache Operations
- Calling TI BIOS APIs from OpenCL C kernels
- Setting Timeout Limit on OpenCL Kernels
- Environment Variables
- Dispatch from multiple Linux processes
- Optimization Tips
- Optimization Techniques for Host Code
- Optimization Techniques for Device (DSP) Code
- Prefer Kernels with 1 work-item per work-group
- Use Local Buffers
- Use async_work_group_copy and async_work_group_strided_copy
- Avoid DSP writes directly to DDR
- Use the reqd_work_group_size attribute on kernels
- Use the TI OpenCL extension than allows Standard C code to be called from OpenCL C code
- Avoid OpenCL C Barriers
- Use the most efficient data type on the DSP
- Do Not Use Large Vector Types
- Consecutive memory accesses
- Prefer the CPU style of writing OpenCL code over the GPU style
- Typical Steps to Optimize Device Code
- Example: Optimizing 1D convolution kernel
- Overview
- Summary of results
- Driver code setup
- k_baseline: Ensure correct measurements
- k_baseline: Check software pipelining
- k_loop: Improve software pipelining
- k_loop_simd: Improve software pipelining with SIMDization
- k_loop_db: EDMA and double buffer k_loop
- k_loop_simd_db: EDMA and double buffer k_loop_simd
- k_loop_simd_db_extc: Use external C function for k_loop_simd_db
- Example: Optimizing 3x3 Gaussian smoothing filter
- Performance Data
- Debug
- Profiling
- OpenCL on TI-RTOS
- Examples
- Building and Running
- Example Descriptions
- platforms example
- simple example
- mandelbrot, mandelbrot_native examples
- ccode example
- matmpy example
- offline example
- vecadd_openmp example
- vecadd_openmp_t example
- vecadd example
- vecadd_mpax example
- vecadd_mpax_openmp example
- vecadd_subdevice example
- vecadd_compile_link example
- vecadd_compile_link_loadbinary example
- dsplib_fft example
- ooo, ooo_map examples
- null example
- sgemm example
- dgemm example
- conv1d example
- edmamgr example
- edmabw example
- dspheap example
- abort_exit example
- timeout example
- Float compute example
- Monte Carlo example
- Frequently Asked Questions
- How do I get support for TI OpenCL products?
- Which TI OpenCL Version is Installed?
- Using Python OpenCL with the TI OpenCL implementation
- Guidelines for porting Stand-alone DSP applications to OpenCL
- OpenCL Interoperability with Host OpenMP
- Does TI’s OpenCL support images and samplers?
- Why does the OpenCL ICD installed on my platform not find the TI OpenCL implementation?
- Why do I get messages about /var/lock/opencl when running OpenCL applications?
- Why do I get DLOAD error messages when running OpenCL applications?
- How do I limit log file sizes on EVM’s temporary file storage (tmpfs)?
- How do I allocate more DDR memory for OpenCL use?
- Release Notes
- Disclaimer
- Important Notice