The amount of Local, MSMC and DDR that is available for OpenCL use
may change from release to release. The amount available for OpenCL use
can be queried from the OpenCL runtime. This is illustrated in the
platform example shipped with the product:
The following device components are relevant to the memory discussion for OpenCL.
|ARM A15 CPU cores||4||2||4||1||2|
|C66 DSP cores||8||4||1||1||2|
|L1P per C66 core||32KB||32KB||32KB||32KB||32KB|
|L1D per C66 core||32KB||32KB||32KB||32KB||32KB|
|L2 cache shared across ARM cores||4MB||1MB||4MB||512KB||2MB|
|L2 memory per C66 core||1MB||1MB||512KB||1MB||288KB|
|DDR3 available||up to 8GB||up to 8GB||up to 8GB||2GB||2GB|
|On-chip shared memory||6MB||2MB||2MB||1MB||1MB|
The L1 and L2 memory areas in the C66x cores can be configured as all cache, all scratchpad or partitioned with both. For OpenCL applications, this partition defaults to the following values for each C66x core:
|L2 available for OpenCL local buffers||768KB||768KB||256KB||832KB||128KB|
The below image illustrates the memory regions in which OpenCL buffers may reside. Those regions are highlighted in blue. It also shows the paths from those regions through the caches to/from the C66x DSP cores. There are other busses for data movement on the device, for example to move data from ddr to L2 SRAM. This image, however, is focused on automatic data movement through cache memory.
As the image illustrates, OpenCL buffers residing in L2 SRAM or MSMC will bypass the L2 cache, but are still cached in the L1D cache. OpenCL buffers residing in off-chip DDR3 will be cached in both L2 and L1D.
OpenCL global buffers may reside in either DDR or MSMC. OpenCL local buffers reside in L2 SRAM