5. Dynamic memory management in target regions

Memory requests (malloc, calloc, etc.) within OpenMP target regions are satisfied by allocating portions from DSP heaps. The OpenMP implementation specifies fixed sizes and placements for private DSP heaps as well as the heap that is shared by all the DSPs. The DSP runtime provides additional APIs to initialize and manage heaps in order to afford the user more flexibility to control the size and location of heaps.

5.1. DSP Heaps in Shared Memory (DDR or MSMC)

The DSP runtime provides the following APIs to initialize and manage heaps in shared memory.

5.1.1. Heap Initialization API

The heap initialization functions __heap_init_[ddr|msmc] must be called by one of the DSP cores to initialize internal heap data structures before making any memory management calls such as __malloc_[ddr|msmc]. Once initialized, the heaps are accessible by all the DSP cores. These APIs are thread safe under the OpenMP and OpenCL programming models on the DSP (Each DSP is running a single thread of execution).

Note: If data allocated on the heap is shared across DSP cores, the programmer is responsible for cache consistency of allocated memory across cores . If OpenMP is used to parallelize the program, cache consistency is managed by the OpenMP runtime.

void  __heap_init_ddr(void *ptr, int size);
void  __heap_init_msmc(void *ptr, int size);

Note that ptr is a pointer to underlying memory to be configured as a user-controlled heap. Therefore, the underlying memory must be allocated before calling the heap initialization function. Initialized heaps are persistent across target regions and kernels until the underlying memory regions for them are deallocated.

5.1.1.1. Via OpenMP

The following code snippets illustrate how to allocate memory for heaps and call the initialization functions from OpenMP.

/*-----------------------------------------------------------------------------
* User-controlled DSP heaps are initialized within a target region. The call
* to __heap_init_xxx can be included within any target region. However the
* initialization function must be called before any __malloc_xxx calls are
* made.
*
* User-controlled DSP heaps can be persistent across target regions as long as
* the underlying memory (aka buffers pointed to by p are not deallocated.
*----------------------------------------------------------------------------*/
void heap_init_ddr(char* p, size_t bytes)
{
#pragma omp target map(to:bytes,p[0:bytes])
   {
      __heap_init_ddr(p,bytes);
   }
}

void heap_init_msmc(char *p, size_t bytes)
{
#pragma omp target map(to: bytes, p[0:bytes])
   {
      __heap_init_msmc(p,bytes);
   }
}
/*-----------------------------------------------------------------------------
* The DSP core executing the enclosed target region will allocate from the
* heaps and then free the memory.
*----------------------------------------------------------------------------*/
void alloc_and_free(size_t bytes)
{
#pragma omp target map(to: bytes)
   {
      char *p1 = (char *) __malloc_ddr(bytes);
      char *p2 = (char *) __malloc_msmc(bytes);

      if (!p1 || !p2)
         printf("Error\n");
      else
      {
         printf("DDR  heap pointer is 0x%08x\n", p1);
         printf("MSMC heap pointer is 0x%08x\n", p2);
      }

      __free_ddr(p1);
      __free_msmc(p2);
   }
}
/*------------------------------------------------------------------------
* From the host, create the underlying memory store for the heaps
*-----------------------------------------------------------------------*/
int ddr_heap_size  = 16 << 20;
int msmc_heap_size = 1 << 20;
char* HeapDDR = (char*) __malloc_ddr(ddr_heap_size);
char* HeapMSMC = (char*) __malloc_msmc(msmc_heap_size);

/*------------------------------------------------------------------------
* Initialize the pre-allocated buffers as new DDR and MSMC heaps
* accessible to DSP cores.
*-----------------------------------------------------------------------*/
heap_init_ddr (HeapDDR,  ddr_heap_size);
heap_init_msmc(HeapMSMC, msmc_heap_size);
/*------------------------------------------------------------------------
* On each DSP core, alloc memory from both ddr and msmc and then free it.
*-----------------------------------------------------------------------*/
alloc_and_free(1024);

5.1.1.2. Via a DATA_SECTION pragma in DSP C code

/* Array is already aligned on a 64b boundary. No need for DATA_ALIGN */
#define MSMC_HEAP_SIZE (16*1024)
#pragma DATA_SECTION(msmc_heap, ".mem_msm")
char msmc_heap[MSMC_HEAP_SIZE];

...
void foo()
{
    __heap_init_msmc ((void *)msmc_heap, MSMC_HEAP_SIZE);
    ...

    double *p = (double *)__malloc_msmc(sizeof(double)*256);
    ...
    __free_msmc(p);
}

5.1.2. Dynamic Memory Management APIs

After the DDR and/or MSMC heap is initialized by one of the DSP cores using the API specified in Section Heap Initialization API, the following APIs are available from all DSP cores for dynamic memory management:

5.1.2.1. Heap in DDR

void *__malloc_ddr   (size_t size);
void *__calloc_ddr   (size_t num, size_t size);
void *__realloc_ddr  (void *ptr,  size_t size);
void  __free_ddr     (void *ptr);
void *__memalign_ddr (size_t alignment, size_t size);

5.1.2.2. Heap in MSMC

void *__malloc_msmc   (size_t size);
void *__calloc_msmc   (size_t num, size_t size);
void *__realloc_msmc  (void *ptr, size_t size);
void  __free_msmc     (void *ptr);
void *__memalign_msmc (size_t alignment, size_t size);

5.2. DSP Heap in Local Memory (L2SRAM)

The DSP runtime provides a simplistic API to initialize a heap in L2 SRAM and allocate from it. This heap is local to the core which initialized it.

5.2.1. Heap Initialization API

A heap can be initialized in L2 SRAM via the following API:

void  __heap_init_l2(void *ptr, int size);

The storage associated with the heap must be start on a 64bit boundary. Unlike DDR and MSMC heaps, heaps initialized in L2 SRAM do not persist across target regions or kernels. Underlying storage for dsp heaps in local memory can be set up in one of the following ways:

5.2.1.1. Via the *local* map type in OpenMP

TI’s OpenMP implementation includes a TI-specific *local* map type that allows data to be allocated on a DSP’s L2 SRAM. This allocated buffer can be used to initialize the heap.

void l2_alloc_and_free(char *p, size_t bytes)
{
   //p is actually just a dummy buffer. It will not be copied to the DSPs.
#pragma omp target map(to:bytes) map(local:p[0:bytes])
   {
      //p gets allocated in DSP L2 SRAMS at the start of the target region
      char *p1;
      __heap_init_l2(p, bytes);
      p1 = (char *) __malloc_l2(bytes);
      if (!p1)
         printf("Error\n");
      else
         printf("L2SRAM  heap pointer is 0x%08x\n", p1);
   }
}

5.2.1.2. Via a DATA_SECTION pragma in DSP C code

/* Array is already aligned on a 64b boundary. No need for DATA_ALIGN */
#define L2_HEAP_SIZE (256)
#pragma DATA_SECTION(l2_heap, ".mem_l2")
char l2_heap[L2_HEAP_SIZE];

...
void foo()
{
    __heap_init_l2 ((void *)l2_heap, L2_HEAP_SIZE);
    ...

    ... __malloc_l2(sizeof(double));
}

5.2.2. Dynamic Memory Management APIs

After the L2 heap is initialized by the DSP cores using the __heap_init_l2 call, the only API available is a malloc:

void *__malloc_l2 (size_t size); /* Pointer returned is aligned to 64 bit boundary */