Using Intrinsics to Access Assembly Language Statements

The C28x compiler recognizes a number of intrinsic operators. Intrinsics allow you to express the meaning of certain assembly statements that would otherwise be cumbersome or inexpressible in C/C++. Intrinsics are used like functions; you can use C/C++ variables with these intrinsics, just as you would with any normal function.

The intrinsics are specified with a leading double underscore, and are accessed by calling them as you do a function. For example:

long lvar; int ivar; unsigned int uivar; lvar = __mpyxu(ivar, uivar);

The intrinsics listed in Table 7-6 are available. They correspond to the indicated TMS320C28x assembly language instruction(s). See the TMS320C28x CPU and Instruction Set Reference Guide for more information.

Table 7-6 TMS320C28x C/C++ Compiler Intrinsics

Intrinsic Assembly Instruction(s) Description
int __abs16_sat( int src); SETC OVM

MOV AH,src

ABS ACC

MOVdst, AH

CLRC OVM

Clear the OVM status bit. Load src into AH. Take absolute value of ACC. Store AH into dst. Clear the OVM status bit.
void __add( int *m, int b); ADD *m, b Add the contents of memory location m to b and store the result in m, in an atomic way.
long __addcu( long src1, unsigned int src2); ADDCU ACC, {mem | reg} The contents of src2 and the value of the carry bit are added to ACC. The result is in ACC.
void __addl( long *m, long b); ADDL *m, b Add the contents of memory location m to b and store the result in m, in an atomic way.
void __and( int *m, int b); AND *m, b AND the contents of memory location m to b and store the result in m, in an atomic way.
int &__byte( int *array, unsigned int byte_index); MOVBarray[byte_index].LSB, src

or

MOVBdst, array[byte_index ].LSB

The lowest addressable unit in C28x is 16 bits. Therefore, normally you cannot access 8-bit entities off a memory location. This intrinsic helps access an 8-bit quantity off a memory location, and can be invoked as follows: __byte(array,5) = 10;
b = __byte(array,20);
unsigned long &y__byte_peripheral_32(unsigned long *x); Used to access a 32-bit byte peripheral data address without the access being broken in half. The intrinsic returns a reference to an unsigned long and can be used both to read and write data. See Section 6.16.6.
void __dec( int *m); DEC *m Decrement the contents of memory location m in an atomic way.
unsigned int __disable_interrupts( ); PUSH ST1
SETC INTM, DBGM
POPreg16
Disable interrupts and return the old value of the interrupt vector.
void __dmac( long *src1, long *src2, long &accum1, long &accum2, int shift); SPMn ; the PM value required for shift
MOVLACC,accum1
MOVL P, accum2
MOVL XARx,src1
MOVL XAR7,src2
DMAC ACC:P, *XARx++, *XAR7++
Set the required PM value for shift.
Move accum1 and accum2 into ACC and P.
Move the addresses src1 and src2 into XARx and XAR7.
ACC = ACC + (src1[i+1] * src2[i+1]) << PM
P = P + (src1[i] * src2[i]) << PM
See Section 3.15.3 for more information.
void __eallow( void ); EALLOW Permits the CPU to write freely to protected registers.
void __edis( void ); EDIS Prevents the CPU from writing freely to protected registers after EALLOW is used.
unsigned int __enable_interrupts( ); PUSH ST1
CLRC INTM, DBGM
POPreg16
Enable interrupts and return the old value of the interrupt vector.
int __flip16(int src); Reverses order of bits in int src.
long __flip32(long src); Reverses order of bits in long src.
long long __flip64(long long src); Reverses order of bits in long long src.
void __inc( int *m); INC *m Increment the contents of memory location m in an atomic way.
long=__IQ( long double A, int N); Convert the long double A into the correct IQN value returned as a long type. If both arguments are constants the compiler converts the arguments to the IQ value during compile time. Otherwise a call to the RTS routine, __IQ, is made. This intrinsic cannot be used to initialize global variables to the .cinit section.
long dst =__IQmpy( long A, long B, int N); Perform optimized multiplication using the C28 IQmath library. The dst becomes ACC or P, A becomes XT:
If   N == 0: IMPYL {ACC|P}, XT, B The dst is ACC or P. If dst is ACC, the instruction takes 2 cycles. If dst is P, the instruction takes 1 cycle.
If    0 < N < 16: IMPYL P, XT, B
QMPYL ACC, XT, B
ASR64 ACC:P, #N
If    15 < N < 32: IMPYL P, XT, B
QMPYL ACC, XT, B
LSL64 ACC:P, #(32-N)
If    N == 32: QMPYL {ACC|P}, XT, B
If    N is a variable: IMPYL P, XT, B
QMPYL ACC, XT, B
MOV T,N
LSR64 ACC:P, T
long dst= __IQsat( long A, long max, long min); The dst becomes ACC. Different code is generated based on the value of max and/or min.
If    max and min are 22-bit unsigned constants: MOVL ACC, A
MOVL XARn, #22bits
MINL ACC, P
MOVL XARn, #22bits
MAXL ACC, P
If    max and min are other constants: MOVL ACC, A
MOV PL, #max lower 16 bits
MOV PH, #max upper 16 bits
MINL ACC, P
MOV PL, #min lower 16 bits
MOV PH, #min upper 16 bits
MAXL ACC, P
If    max and/or min are variables: MOVL ACC, A
MINL ACC, max
MAXL ACC, min
long dst= __IQxmpy(long A, long B, int N); Perform optimized multiplication by a power of 2 using the C28 IQmath library. The dst becomes ACC or P; A becomes XT. Code is generated based on the value of N.
If    N == 0: IMPYL ACC/P, XT, B The dst is in ACC or P.
If    0 < N < 17: IMPYL P, XT, B
QMPYL ACC, XT, B
LSL64 ACC:P, #N
The dst is in ACC.
If    0 > N > -17: QMPYL ACC, XT, B
SETC SXM
SFR ACC, #abs(N)
The dst is in ACC.
If    16 < N < 32: IMPYL P, XT, B
QMPYL ACC, XT, B
ASR64 ACC:P, #N
The dst is in P.
If    N == 32: IMPYL P, XT, B The dst is in P.
If    -16 > N > -33 QMPYL ACC, XT, B
SETC SXM SRF ACC, #16
SRF ACC, #abs(N)−16
The dst is in ACC.
If    32 < N < 49: IMPYL ACC, XT, B
LSL ACC, #N -32
The dst is in ACC.
If    -32 > N > -49: QMPYL ACC, XT, B
SETC SXM SFR ACC, #16
SFR ACC, #16
The dst is in ACC.
If    48 < N < 65: IMPYL ACC, XT, B
LSL64 ACC:P, #16
LSL64 ACC:P, #N−48
The dst is in ACC.
If    -48 > N > -65: QMPYL ACC, XT, B
SETC SXM SFR ACC, #16
SFR ACC, #16
The dst is in ACC.
long long __llmax(long long dst, long long src); MAXL ACC,src.hi32
MAXCUL P,src.lo32
If src > dst, copy src to dst.
long long __llmin(long long dst, long long src); MINL ACC,src.hi32
MINCUL P,src.lo32
If src < dst, copy src to dst
long __lmax(long dst, long src); MAXL ACC,src If src > dst, copy src to dst.
long __lmin(long dst, long src); MINL ACC,src If src < dst, copy src to dst
int __max(int dst, int src); MAXdst, src If src > dst, copy src to dst
int __min(int dst, int src); MINdst, src If src < dst, copy src to dst
int __mov_byte( int *src, unsigned int n); MOVB AX.LSB,*+XARx[ n ]

or

MOVZ AR0/AR1, @n

MOVB AX.LSB,*XARx[ {AR0|AR1} ]

Return the 8-bit nth element of a byte table pointed to by src.

This intrinsic is provided for backward compatibility. The intrinsic __byte is preferred as it returns a reference. Nothing can be done with __mov_byte() that cannot be done with __byte().

long __mpy( int src1, int src2); MPY ACC,src1, #src2 Move src1 to the T register. Multiply T by a 16-bit immediate (src2). The result is in ACC.
long __mpyb( int src1, uint src2); MPYB {ACC | P}, T, #src2 Multiply src1 (the T register) by an unsigned 8-bit immediate (src2). The result is in ACC or P.
long __mpy_mov_t( int src1, int src2, int *dst2); MPY ACC, T,src2
MOV @dst2, T
Multiply src1 (the T register) by src2. The result is in ACC. Move src1 to *dst2.
unsigned long __mpyu(unit src2, unit srt2); MPYU {ACC | P}, T,src2 Multiply src1 (the T register) by src2. Both operands are treated as unsigned 16-bit numbers. The result is in ACC or P.
long __mpyxu( int src1, uint src2); MPYXU ACC, T, {mem|reg} The T register is loaded with src1. The src2 is referenced by memory or loaded into a register. The result is in ACC.
long dst= __norm32(long src, int *shift); CSB ACC
LSLL ACC, T
MOV @shift, T
Normalize src into dst and update *shift with the number of bits shifted.
long long dst= __norm64(long long src,
    int *shift);
CSB ACC
LSL64 ACC:P, T
MOV @shift, T
CSB ACC
LSL64 ACC:P, T
MOV TMP16, AH
MOV AH, T
ADD shift, AH
MOV AH, TMP16
Normalize 64-bit src into dst and update *shift with the number of bits shifted.
void __or( int *m, int b); OR *m, b OR the contents of memory location m to b and store the result in m, in an atomic way.
long __qmpy32( long src32a, long src32b, int q); CLRC OVM SPM − 1
MOV T, src32a + 1
MPYXU P, T, src32b + 0
MOVP T, src32b + 1
MPYXU P, T, src32a + 0
MPYA P, T, src32a + 1
Extended precision DSP Q math. Different code is generated based on the value of q.
If   q = 31,30: SPMq − 30
SFR ACC, #45 − q
ADDL ACC, P
If   q = 29: SFR ACC, #16
ADDL ACC, P
If   q = 28 through 24: SPM q - 30
SFR ACC, #16
SFR ACC, #29 - q
ADDL ACC, P
If   q = 23 through 13: SFR ACC, #16
ADDL ACC, P
SFR ACC, #29 − q
If   q = 12 through 0: SFR ACC, #16
ADDL ACC, P
SFR ACC, #16
SFR ACC, #13 − q
long __qmpy32by16(long src32, int src16, int q); CLRC OVM
MOV T, src16 + 0
MPYXU P, T, src32 + 0
MPY P, T, src32 + 1
Extended precision DSP Q math. Different code is generated based on the value of q.
If    q = 31, 30: SPM q − 30
SFR ACC, #46 − q
ADDL ACC, P
If    q = 29 through 14: SPM 0
SFR ACC, #16
ADDL ACC, P
SFR ACC, #30 − q
If    q = 13 through 0: SPM 0
SFR ACC, #16
ADDL ACC, P
SFR ACC, #16
SFR ACC, #14 − q
void __restore_interrupts(unsigned int val); PUSH val
POP ST1
Restore interrupts and set the interrupt vector to value val.
long __rol( long src); ROL ACC Rotate ACC left.
long __ror( long src); ROR ACC Rotate ACC right.
void *result= __rpt_mov_imm(void *dst, int src,
       int count);
MOV result, dst
MOV ARx,dst
RPT #count
|| MOV *XARx++, #src
Move the dst register to the result register. Move the dst register to a temp (ARx) register. Copy the immediate src to the temp register count + 1 times.

The src must be a 16-bit immediate. The count can be an immediate from 0 to 255 or a variable.

int __rpt_norm_inc( long src, int dst, int count); MOV ARx, dst
RPT #count
|| NORM ACC, ARx++
Repeat the normalize accumulator value count + 1 times.

The count can be an immediate from 0 to 255 or a variable.

int __rpt_norm_dec(long src, int dst, int count); MOV ARx, dst
RPT #count
|| NORM ACC, ARx--
Repeat the normalize accumulator value count + 1 times.

The count can be an immediate from 0 to 255 or a variable.

long __rpt_rol(long src, int count); RPT #count
|| ROL ACC
Repeat the rotate accumulator left count + 1 times. The result is in ACC.

The count can be an immediate from 0 to 255 or a variable.

long __rpt_ror(long src, int count); RPT #count
|| ROR ACC
Repeat the rotate accumulator right count + 1 times. The result is in ACC.

The count can be an immediate from 0 to 255 or a variable.

long __rpt_subcu(long dst, int src, int count); RPT count
|| SUBCU ACC, src
The src operand is referenced from memory or loaded into a register and used as an operand to the SUBCU instruction. The result is in ACC.

The count can be an immediate from 0 to 255 or a variable. The instruction repeats count + 1 times.

unsigned long __rpt_subcul(unsigned long num, unsigned long den, unsigned long &remainder, int count); RPT count
|| SUBCUL ACC, den
Performs repeated conditional long subtraction as typically used in unsigned modulus division. Returns the quotient.
long __sat( long src); SAT ACC Load ACC with 32-bit src. The result is in ACC.
long __sat32( long src, long limit); SETC OVM
ADDL ACC, {mem|P}
SUBL ACC, {mem|P}
SUBL ACC, {mem|P}
ADDL ACC, {mem|P}
CLRC OVM
Saturate a 32-bit value to a 32-bit mask. Load ACC with src. Limit value is either referenced from memory or loaded into the P register. The result is in ACC.
long __sathigh16(long src, int limit); SETC OVM
ADDL ACC, {mem|P}<<16
SUBL ACC, {mem|P}<<16
SUBL ACC, {mem|P}<<16
ADDL ACC, {mem|P}<<16
CLRC OVM
SFR ACC, rshift
Saturate a 32-bit value to 16-bits high. Load ACC with src. The limit value is either referenced from memory or loaded into register. The result is in ACC. The result can be right shifted and stored into an int. For example: ivar=__sathigh16(lvar, mask)>>6;
long __satlow16( long src); SETC OVM
MOV T, #0xFFFF
CLR SXM ; if necessary
ADD ACC, T <<15
SUB ACC, T <<15
SUB ACC, T <<15
ADD ACC, T <<15
CLRC OVM
Saturate a 32-bit value to 16-bits low. Load ACC with src. Load T register with #0xFFFF. The result is in ACC.
long __sbbu( long src1, uint src2); SBBU ACC, src2 Subtract src2 + logical inverse of C from ACC (src1). The result is in ACC.
void __sub( int *m, int b); SUB *m, b Subtract b from the contents of memory location m and store the result in m, in an atomic way.
long __subcu( long src1, int src2); SUBCU ACC, src2 Subtract src2 shifted left 15 from ACC (src1). The result is in ACC.
unsigned long __subcul(unsigned long num, unsigned long den, unsigned long &remainder); SUBCUL ACC, den Performs a single conditional long subtraction as typically used in unsigned modulus division. Returns the quotient.
void __subl( long *m, long b); SUBL *m, b Subtract b from the contents of memory location m and store the result in m, in an atomic way.
void __subr( int *m, int b); SUBR *m, b Subtract the contents of memory location m from b and store the result in m, in an atomic way.
void __subrl( long *m, long b); SUBRL *m, b Subtract the contents of memory location m from b and store the result in m, in an atomic way.
if (__tbit( int src, int bit) ); TBIT src, #bit SET TC status bit if specified bit of src is 1.
void __xor( int *m, int b); XOR *m, b XOR the contents of memory location m to b and store the result in m, in an atomic way.