ARM Assembly Language Tools v16.9.0.LTS User's Guide
SPNU118P - REVISED OCTOBER 2016

Assembler Description

The ARM assembler translates assembly language source files into machine language object files. These files are object modules, which are discussed in Section 2. Source files can contain the following assembly language elements:

Assembler directives described in Section 5
Macro directives described in Section 6
Assembly language instructions described in the TMS470R1x User's Guide.

Assembler Overview

The 2-pass assembler does the following:

  • Processes the source statements in a text file to produce a relocatable object file
  • Produces a source listing (if requested) and provides you with control over this listing
  • Allows you to divide your code into sections and maintain a section program counter (SPC) for each section of object code
  • Defines and references global symbols and appends a cross-reference listing to the source listing (if requested)
  • Allows conditional assembly
  • Supports macros, allowing you to define macros inline or in a library

The Assembler's Role in the Software Development Flow

Figure 4-1 illustrates the assembler's role in the software development flow. The shaded portion highlights the most common assembler development path. The assembler accepts assembly language source files as input, both those you create and those created by the ARM C/C++ compiler.

Figure 4-1 The Assembler in the ARM Software Development Flow assyflow_pnu118.gif

Invoking the Assembler

To invoke the assembler, enter the following:

armclinput file [options]
armcl is the command that invokes the assembler through the compiler. The compiler considers any file with an .asm extension to be an assembly file and invokes the assembler.
input file names the assembly language source file.
options

identify the assembler options that you want to use. Options are case sensitive and can appear anywhere on the command line following the command. Precede each option with one or two hyphens as shown.

The valid assembler options are listed in Table 4-1.

Table 4-1 ARM Assembler Options

Option Alias Description
--absolute_listing -aa Creates an absolute listing. When you use --absolute_listing, the assembler does not produce an object file. The --absolute_listing option is used in conjunction with the absolute lister.
--asm_define=name[=def] -ad Sets the name symbol. This is equivalent to defining name with a .set directive in the case of a numeric value or with an .asg directive otherwise. If value is omitted, the symbol is set to 1. See Section 4.8.5.
--asm_dependency -apd Performs preprocessing for assembly files, but instead of writing preprocessed output, writes a list of dependency lines suitable for input to a standard make utility. The list is written to a file with the same name as the source file but with a .ppa extension.
--asm_includes -api Performs preprocessing for assembly files, but instead of writing preprocessed output, writes a list of files included with the .include directive. The list is written to a file with the same name as the source file but with a .ppa extension.
--asm_listing -al Produces a listing file with the same name as the input file with a .lst extension.
--asm_listing_cross_reference -ax Produces a cross-reference table and appends it to the end of the listing file; it also adds cross-reference information to the object file for use by the cross-reference utility. If you do not request a listing file but use the --asm_listing_cross_reference option, the assembler creates a listing file automatically, naming it with the same name as the input file with a .lst extension.
--asm_undefine=name -au Undefines the predefined constant name, which overrides any --asm_define options for the specified constant.
--cmd_file=filename -@ Appends the contents of a file to the command line. You can use this option to avoid limitations on command line length imposed by the host operating system. Use an asterisk or a semicolon (* or ;) at the beginning of a line in the command file to include comments. Comments that begin in any other column must begin with a semicolon. Within the command file, filenames or option parameters containing embedded spaces or hyphens must be surrounded with quotation marks. For example: "this-file.asm"
--code_state={16|32} -mt --code_state=16 (or -mt) instructs the assembler to begin assembling instructions as 16-bit instructions; UAL syntax (.thumb) for ARMv7 and non-UAL syntax (.state16) otherwise. By default, the assembler begins assembling 32-bit instructions. You can reset the default behavior by specifying --code_state=32. For information on indirect calls in 16-bit versus 32-bit code, see the ARM Optimizing C/C++ Compiler User's Guide.
--endian -me Produces object code in little-endian format. For more information, see the ARM Optimizing C/C++ Compiler User's Guide.
--include_file=filename -ahi Includes the specified file for the assembly module. The file is included before source file statements. The included file does not appear in the assembly listing files.
--include_path=pathname -I Specifies a directory where the assembler can find files named by the .copy, .include, or .mlib directives. There is no limit to the number of directories you can specify in this manner; each pathname must be preceded by the --include_path option. See Section 4.5.1.
--quiet -q Suppresses the banner and progress information (assembler runs in quiet mode).
--symdebug:dwarf or
--symdebug:none
-g (DWARF is on by default) Enables assembler source debugging in the C source debugger. Line information is output to the object module for every line of source in the assembly language source file. You cannot use this option on assembly code that contains .line directives. See Section 4.13.

Controlling Application Binary Interface

An Application Binary Interface (ABI) defines the low level interface between object files, and between an executable and its execution environment. The ABI exists to allow ABI-compliant object code to link together, regardless of its source, and allows the resulting executable to run on any system that supports that ABI. See the ARM Optimizing C/C++ Compiler User's Guide (SPNU151) for information on the EABI ABI. The complete ARM ABI specifications can be found in the ARM Information Center.

COFF object files and the legacy TIABI and TI ARM9 ABI modes are not supported in v15.6.0.STS and later versions of the TI Code Generation Tools. If you would like to produce COFF output files, please use v5.2 of the ARM Code Generation Tools and refer to SPNU151J for documentation.

All object files in an EABI application must be built for EABI. The linker detects situations where object modules conform to different ABIs and generates an error.

Note that converting an assembly file from the COFF API to EABI requires some changes to the assembly code.

Naming Alternate Directories for Assembler Input

The .copy, .include, and .mlib directives tell the assembler to use code from external files. The .copy and .include directives tell the assembler to read source statements from another file, and the .mlib directive names a library that contains macro functions. Section 5 contains examples of the .copy, .include, and .mlib directives. The syntax for these directives is:

.copy ["]filename["]
.include ["]filename["]
.mlib ["]filename["]

The filename names a copy/include file that the assembler reads statements from or a macro library that contains macro definitions. If filename begins with a number the double quotes are required. Quotes are recommended so that there is no issue in dealing with path information that is included in the filename specification or path names that include white space. The filename may be a complete pathname, a partial pathname, or a filename with no path information.

The assembler searches for the file in the following locations in the order given:

  1. The directory that contains the current source file. The current source file is the file being assembled when the .copy, .include, or .mlib directive is encountered.
  2. Any directories named with the --include_path option
  3. Any directories named with the TI_ARM_C_DIR environment variable
  4. Any directories named with the TI_ARM_C_DIRenvironment variable

Because of this search hierarchy, you can augment the assembler's directory search algorithm by using the --include_path option (described in Section 4.5.1) or the TI_ARM_A_DIR environment variable (described in Section 4.5.2). The TI_ARM_C_DIR environment variable is discussed in the ARM Optimizing C/C++ Compiler User's Guide.

NOTE

The TI_ARM_C_DIR environment variable takes precedence over the older TMS470_C_DIR environment variable if both are defined. If only TMS470_C_DIR is set, it will continue to be used. Likewise, the TI_ARM_A_DIR environment variable takes precedence over the older TMS470_A_DIR environment variable if both are defined. If only TMS470_A_DIR is set, it will continue to be used.

Using the --include_path Assembler Option

The --include_path assembler option names an alternate directory that contains copy/include files or macro libraries. The format of the --include_path option is as follows:

armcl --include_path=pathname source filename [other options]

There is no limit to the number of --include_path options per invocation; each --include_path option names one pathname. In assembly source, you can use the .copy, .include, or .mlib directive without specifying path information. If the assembler does not find the file in the directory that contains the current source file, it searches the paths designated by the --include_path options.

For example, assume that a file called source.asm is in the current directory; source.asm contains the following directive statement:

.copy "copy.asm"

Assume the following paths for the copy.asm file:

UNIX: /tools/files/copy.asm
Windows: c:\tools\files\copy.asm

You could set up the search path with the commands shown below:

Operating System Enter
UNIX (Bourne shell) armcl --include_path=/tools/files source.asm
Windows armcl --include_path=c:\tools\files source.asm

The assembler first searches for copy.asm in the current directory because source.asm is in the current directory. Then the assembler searches in the directory named with the --include_path option.

Using the TI_ARM_A_DIR Environment Variable

An environment variable is a system symbol that you define and assign a string to. The assembler uses the TI_ARM_C_DIR environment variable to name alternate directories that contain copy/include files or macro libraries.

The assembler looks for the TI_ARM_A_DIR environment variable and then reads and processes it. If the assembler does not find the TI_ARM_A_DIR variable, it then searches for TI_ARM_C_DIR. The processor-specific variables are useful when you are using Texas Instruments tools for different processors at the same time.

See the ARM Optimizing C/C++ Compiler User's Guide for details on TI_ARM_C_DIR.

NOTE

The TI_ARM_C_DIR environment variable takes precedence over the older TMS470_C_DIR environment variable if both are defined. If only TMS470_C_DIR is set, it will continue to be used. Likewise, the TI_ARM_A_DIR environment variable takes precedence over the older TMS470_A_DIR environment variable if both are defined. If only TMS470_A_DIR is set, it will continue to be used.

The command syntax for assigning the environment variable is as follows:

Operating System Enter
UNIX (Bourne Shell) TI_ARM_A_DIR="pathname1 ;pathname2 ; . . . "; export TI_ARM_A_DIR
Windows set TI_ARM_A_DIR=pathname1 ;pathname2 ; . . .

The pathnames are directories that contain copy/include files or macro libraries. The pathnames must follow these constraints:

  • Pathnames must be separated with a semicolon.
  • Spaces or tabs at the beginning or end of a path are ignored. For example the space before and after the semicolon in the following is ignored:
  • set TI_ARM_A_DIR= c:\path\one\to\tools ; c:\path\two\to\tools
  • Spaces and tabs are allowed within paths to accommodate Windows directories that contain spaces. For example, the pathnames in the following are valid:

In assembly source, you can use the .copy, .include, or .mlib directive without specifying path information. If the assembler does not find the file in the directory that contains the current source file or in directories named by the --include_path option, it searches the paths named by the environment variable.

For example, assume that a file called source.asm contains these statements:

.copy "copy1.asm" .copy "copy2.asm"

Assume the following paths for the files:

UNIX: /tools/files/copy1.asm and /dsys/copy2.asm
Windows: c:\tools\files\copy1.asm and c:\dsys\copy2.asm

You could set up the search path with the commands shown below:

Operating System Enter
UNIX (Bourne shell) TI_ARM_A_DIR="/dsys"; export TI_ARM_A_DIR
armcl --include_path=/tools/files source.asm
Windows TI_ARM_A_DIR=c:\dsys
armcl --include_path=c:\tools\files source.asm

The assembler first searches for copy1.asm and copy2.asm in the current directory because source.asm is in the current directory. Then the assembler searches in the directory named with the --include_path option and finds copy1.asm. Finally, the assembler searches the directory named with TI_ARM_A_DIR and finds copy2.asm.

The environment variable remains set until you reboot the system or reset the variable by entering one of these commands:

Operating System Enter
UNIX (Bourne shell) unset TI_ARM_A_DIR
Windows set TI_ARM_A_DIR=

Source Statement Format

Each line in a ARM assembly input file can be empty, a comment, an assembler directive, a macro invocation, or an assembly instruction.

Assembly language source statements can contain four ordered fields (label, mnemonic, operand list, and comment). The general syntax for source statements is as follows:

[label[:]]mnemonic [operand list][;comment]

Following are examples of source statements:

SYM1 .set 2 ; Symbol SYM1 = 2 Begin: MOV R0, #SYM1 ; Load R0 with 2 .word 016h ; Initialize word (016h)

The ARM assembler reads an unlimited number of characters per line. Source statements that extend beyond 400 characters in length (including comments) are truncated in the listing file.

Follow these guidelines:

  • All statements must begin with a label, a blank, an asterisk, or a semicolon.
  • Labels are optional for most statements; if used, they must begin in column 1.
  • One or more space or tab characters must separate each field.
  • Comments are optional. Comments that begin in column 1 can begin with an asterisk or a semicolon (* or ;), but comments that begin in any other column must begin with a semicolon.

NOTE

A mnemonic cannot begin in column 1 or it will be interpreted as a label. Mnemonic opcodes and assembler directive names without the . prefix are valid label names. Remember to always use whitespace before the mnemonic, or the assembler will think the identifier is a new label definition.

The following sections describe each of the fields.

Label Field

A label must be a legal identifier (see Section 4.8.1) placed in column 1. Every instruction may optionally have a label. Many directives allow a label, and some require a label.

A label can be followed by a colon (:). The colon is not treated as part of the label name. If you do not use a label, the first character position must contain a blank, a semicolon, or an asterisk.

When you use a label on an assembly instruction or data directive, an assembler symbol (Section 4.8) with the same name is created. Its value is the current value of the section program counter (SPC, see Section 2.4.5). This symbol represents the address of that instruction. In the following example, the .word directive is used to create an array of 3 words. Because a label was used, the assembly symbol Start refers to the first word, and the symbol will have the value 40h.

. . . . 9 * Assume some code was assembled 10 00000040 0000000A Start: .word 0Ah,3,7 00000044 00000003 00000048 00000007

A label on a line by itself is a valid statement. When a label appears on a line by itself, it points to the instruction on the next line (the SPC is not incremented):

1 00000000 Here: 2 00000000 00000003 .word 3

A label on a line by itself is equivalent to writing:

Here: .equ $ ; $ provides the current value of the SPC

If you do not use a label, the character in column 1 must be a blank, an asterisk, or a semicolon.

Mnemonic Field

The mnemonic field follows the label field. The mnemonic field cannot start in column 1; if it does, it is interpreted as a label. There is one exception: the parallel bars (||) of the mnemonic field can start in column 1. The mnemonic field contains one of the following items:

  • Machine-instruction mnemonic (such as ADD, MUL, STR)
  • Assembler directive (such as .data, .list, .equ)
  • Macro directive (such as .macro, .var, .mexit)
  • Macro invocation

Operand Field

The operand field follows the mnemonic field and contains zero or more comma-separated operands. An operand can be one of the following:

  • an immediate operand (usually a constant or symbol) (see Section 4.7 and Section 4.8)
  • a register operand
  • a memory reference operand
  • an expression that evaluates to one of the above (see Section 4.9)

An immediate operand is encoded directly in the instruction. The value of an immediate operand must be a constant expression. Most instructions with an immediate operand require an absolute constant expression, such as 1234. Some instructions (such as a call instruction) allow a relocatable constant expression, such as a symbol defined in another file. (See Section 4.9 for details about types of expressions.)

A register operand is a special pre-defined symbol that represents a CPU register.

A memory reference operand uses one of several memory addressing modes to refer to a location in memory. Memory reference operands use a special target-specific syntax defined in the appropriate CPU and Instruction Set Reference Guide.

You must separate operands with commas. Not all operand types are supported for all operands. See the description of the specific instruction in the CPU and Instruction Set Reference Guide for your device family.

Operand Syntaxes for Instructions

The assembler allows you to specify that an operand should be used as an address, an immediate value, an indirect address, a register, a shifted register, or a register list. The following rules apply to the operands of instructions.

  • # prefix — the operand is an immediate value. Using the # sign as a prefix causes the assembler to treat the operand as an immediate value. This is true even if the operand is a register; the assembler treats the register as a value instead of using the contents of the register. For example:
  • Label: ADD R1, R1, #123 ; Add 123 (decimal) to the value of R1 and place the result in R1.
  • Square brackets — the operand is an indirect address. If the operand is enclosed in square brackets, the assembler treats the operand as an indirect address; that is, it uses the contents of the operand as an address. Indirect addresses consist of a base and an offset. The base is specified by a register and is formed by taking the value in the register. The offset can be specified by a register, an immediate value, or a shifted register. Furthermore, the offset can be designated as one of the following:
    • Pre-index, where the base and offset are combined to form the address. To designate a pre-index offset, include the offset within the enclosing right bracket.
    • Postindex, where the address is formed from the base, and then the base and offset are combined. To designate a postindex offset, include the offset outside of the right bracket.
  • The offset can be added to or subtracted from the base. The following are examples of instructions that use indirect addresses as operands:

    A: LDR R1, [R1] ; Load from address in R1 into R1. LDR R7, [R1, #5] ; Form address by adding the value in R1 to 5. Load from address into R7. STR R3, [R1, -R2] ; Form address by subtracting the value in R2 from the value in R1. Store from R3 ; to memory at address. STR R14, [R1, +R3, LSL #2] ; Form address by adding the value in R3 shifted left by 2 to the value in R1. ; Store from R14 to memory at address. LDR R1, [R1], #5 ; Load from address in R1 into R1, then add 5 to the address. STR R2, [R1], R5 ; Store value in R2 in the address in R1, then add the value in R5 to the address.
  • ! suffix — write-back to register. If you use the ! sign as a suffix, the assembler writes the computed address back to the base register. Write-back to register is used only with the indirect addressing mode syntax.
  • This is an example of an instruction using the write back to register suffix:

    LDR R1, [R4, #4]! ; Form address by adding the value in R4 to 4. Load from this address into R1, ; then replace the value in R4 with the address.
  • ^ suffix — set S bit. If you use the ^ sign as a suffix, the assembler sets the S bit. The resulting action depends on the type of instruction being executed and whether R15 is in the transfer list. For more information, see the LDM and STM instructions in the TMS470R1x User's Guide.
  • LDMIA SP, {R4-R11, R15}^ ; Load registers R4 through R11 and R15 from memory at SP. Load CPSR with SPSR.
  • Shifted registers. If a register symbol is followed by a shift type, the computed value is the value in the register shifted according to the type as defined below:
  • LSL
    LSR
    ASL
    ASR
    ROR
    RRX
    Logical shift left
    Logical shift right
    Arithmetic shift left
    Arithmetic shift right
    Rotate right
    Rotate right extended

    The shift type can be followed by a register or an immediate whose value defines the shift amount. The following are examples of instructions that use shifted registers as operands:

    B: ADD R1, R4, R5, LSR R2 ; Logical shift right the value in R5 by the value in R2. Add the value in R5 to R4. ; Place result in R1. LDR R1, [R5, R4, LSL #4] ; Form address by adding the value in R4 shifted left by 4 to the value in R5. ; Load from address into R1. CMP R3, R4, RRX ; Compare the value in R3 with the value in R4 rotate right extend.
  • Curly braces - the operand is a register list. If you surround registers with curly braces, the assembler treats the operand as a list of registers. You can separate registers with commas or indicate a range of registers with a dash. The following are examples of instructions that use register lists:
  • LDMEA R2, {R1, R3, R6} ; Pre-decrement stack load. Load registers R1, R3 and R6 from memory at the address in R2. STMFD R12, {R1, R3-R5} ; Pre-increment stack store. Store from registers R1 and R3 through R5 to memory at the ; address in R12.

Immediate Values as Operands for Directives

You use immediate values as operands primarily with instructions. In some cases, you can use immediate values with the operands of directives. For instance, you can use immediate values with the .byte directive to load values into the current section.

It is not usually necessary to use the # prefix for directives. Compare the following statements:

ADD R1, #10 .byte 10

In the first statement, the # prefix is necessary to tell the assembler to add the value 10 to R1. In the second statement, however, the # prefix is not used; the assembler expects the operand to be a value and initializes a byte with the value 10.

See Section 5 for more information on the syntax and usage of directives.

Comment Field

A comment can begin in any column and extends to the end of the source line. A comment can contain any ASCII character, including blanks. Comments are printed in the assembly source listing, but they do not affect the assembly.

A source statement that contains only a comment is valid. If it begins in column 1, it can start with a semicolon ( ; ) or an asterisk ( *). Comments that begin anywhere else on the line must begin with a semicolon. The asterisk identifies a comment only if it appears in column 1.

Literal Constants

A literal constant (also known as a literal or in some other documents as an immediate value) is a value that represents itself, such as 12, 3.14, or "hello".

The assembler supports several types of literals:

  • Binary integer literals
  • Octal integer literals
  • Decimal integer literals
  • Hexadecimal integer literals
  • Character literals
  • Character string literals
  • Floating-point literals

Error checking for invalid or incomplete literals is performed.

Integer Literals

The assembler maintains each integer literal internally as a 32-bit signless quantity. Literals are considered unsigned values, and are not sign extended. For example, the literal 00FFh is equal to 00FF (base 16) or 255 (base 10); it does not equal -1. which is 0FFFFFFFFh (base 16). Note that if you store 0FFh in a .byte location, the bits will be exactly the same as if you had stored -1. It is up to the reader of that location to interpret the signedness of the bits.

Binary Integer Literals

A binary integer literal is a string of up to 32 binary digits (0s and 1s) followed by the suffix B (or b). Binary literals of the form "0[bB][10]+" are also supported. If fewer than 32 digits are specified, the assembler right justifies the value and fills the unspecified bits with zeros. These are examples of valid binary literals:

00000000B Literal equal to 010 or 016
0100000b Literal equal to 3210 or 2016
01b Literal equal to 110 or 116
11111000B Literal equal to 24810 or 0F816
0b00101010 Literal equal to 4210 or 2A16
0B101010 Literal equal to 4210 or 2A16

Octal Integer Literals

An octal integer literal is a string of up to 11 octal digits (0 through 7) followed by the suffix Q (or q). Octal literals may also begin with a 0, contain no 8 or 9 digits, and end with no suffix. These are examples of valid octal literals:

10Q Literal equal to 810 or 816
054321 Literal equal to 2273710 or 58D116
100000Q Literal equal to 3276810 or 800016
226q Literal equal to 15010 or 9616

Decimal Integer Literals

A decimal integer literal is a string of decimal digits ranging from -2147 483 648 to 4 294 967 295. These are examples of valid decimal integer literals:

1000 Literal equal to 100010 or 3E816
-32768 Literal equal to -32 76810 or -800016
25 Literal equal to 2510 or 1916
4815162342 Literal equal to 481516234210 or 11F018BE616

Hexadecimal Integer Literals

A hexadecimal integer literal is a string of up to eight hexadecimal digits followed by the suffix H (or h) or preceded by 0x. A hexadecimal literal must begin with a decimal value (0-9) if it is indicated by the H or h suffix.

Hexadecimal digits include the decimal values 0-9 and the letters A-F or a-f. If fewer than eight hexadecimal digits are specified, the assembler right-justifies the bits.

These are examples of valid hexadecimal literals:

78h Literal equal to 12010 or 007816
0x78 Literal equal to 12010 or 007816
0Fh Literal equal to 1510 or 000F16
37ACh Literal equal to 1425210 or 37AC16

Character Literals

A character literal is a single character enclosed in single quotes. The characters are represented internally as 8-bit ASCII characters. Two consecutive single quotes are required to represent each single quote that is part of a character literal. A character literal consisting only of two single quotes is valid and is assigned the value 0. These are examples of valid character literals:

'a' Defines the character literal a and is represented internally as 6116
'C' Defines the character literal C and is represented internally as 4316
'''' Defines the character literal ' and is represented internally as 2716
'' Defines a null character and is represented internally as 0016
Notice the difference between character literals and character string literals (Section 4.7.2 discusses character strings). A character literal represents a single integer value; a string is a sequence of characters.

Character String Literals

A character string is a sequence of characters enclosed in double quotes. Double quotes that are part of character strings are represented by two consecutive double quotes. The maximum length of a string varies and is defined for each directive that requires a character string. Characters are represented internally as 8-bit ASCII characters.

These are examples of valid character strings:

"sample program" defines the 14-character string sample program.
"PLAN ""C""" defines the 8-character string PLAN "C".

Character strings are used for the following:

  • Filenames, as in .copy "filename"
  • Section names, as in .sect "section name"
  • Data initialization directives, as in .byte "charstring"
  • Operands of .string directives

Floating-Point Literals

A floating-point literal is a string of decimal digits followed by a required decimal point, an optional fractional portion, and an optional exponent portion. The syntax for a floating-point number is:

[ +|- ] nnn . [ nnn] [ E|e [ +|- ] nnn ]

Replace nnn with a string of decimal digits. You can precede nnn with a + or a -. You must specify a decimal point. For example, 3.e5 is valid, but 3e5 is not valid. The exponent indicates a power of 10. These are examples of valid floating-point literals:

3.0 3.14 3. -0.314e13 +314.59e-2

The assembler syntax does not support all C89-style float literals nor C99-style hexadecimal constants, but the $$strtod built-in mathematical function supports both. If you want to specify a floating-point literal using one of those formats, use $$strtod. For example:

$$strtod(".3") $$strtod("0x1.234p-5")

You cannot directly use NaN, Inf, or -Inf as floating-point literals. Instead, use $$strtod to express these values. The "NaN" and "Inf" strings are handled case-insensitively. See Section 4.10.1 for built-in functions.

$$strtod("NaN") $$strtod("Inf")

Assembler Symbols

An assembler symbol is a named 32-bit signless integer value, usually representing an address or absolute integer. A symbol can represent such things as the starting address of a function, variable, or section. The name of a symbol must be a legal identifier. The identifier becomes a symbolic representation of the symbol's value, and may be used in subsequent instructions to refer to the symbol's location or value.

Some assembler symbols become external symbols, and are placed in the object file's symbol table. A symbol is valid only within the module in which it is defined, unless you use the .global directive or the .def directive to declare it as an external symbol (see .global directive).

See Section 2.6 for more about symbols and the symbol tables in object files.

Identifiers

Identifiers are names used as labels, registers, symbols, and substitution symbols. An identifier is a string of alphanumeric characters, the dollar sign, and underscores (A-Z, a-z, 0-9, $, and _). The first character in an identifier cannot be a number, and identifiers cannot contain embedded blanks. The identifiers you define are case sensitive; for example, the assembler recognizes ABC, Abc, and abc as three distinct identifiers.

Labels

An identifier used as a label becomes an assembler symbol, which represent an address in the program. Labels within a file must be unique.

NOTE

A mnemonic cannot begin in column 1 or it will be interpreted as a label. Mnemonic opcodes and assembler directive names without the . prefix are valid label names. Remember to always use whitespace before the mnemonic, or the assembler will think the identifier is a new label definition.

Symbols derived from labels can also be used as the operands of .bss, .global, .ref, or .def directives.

.global _f LDR A1, CON1 STR A1, [sp, #0] BL _f CON1: .field -269488145,32

Local Labels

Local labels are special labels whose scope and effect are temporary. A local label can be defined in two ways:

  • $n, where n is a decimal digit in the range 0-9. For example, $4 and $1 are valid local labels. See Example 4-1.
  • name?, where name is any legal identifier as described above. The assembler replaces the question mark with a period followed by a unique number. When the source code is expanded, you will not see the unique number in the listing file. Your label appears with the question mark as it did in the source definition.

You cannot declare these types of labels as global.

Normal labels must be unique (they can be declared only once), and they can be used as constants in the operand field. Local labels, however, can be undefined and defined again. Local labels cannot be defined by directives.

A local label can be undefined or reset in one of these ways:

  • By using the .newblock directive
  • By changing sections (using a .sect, .text, or .data directive)
  • By changing the state of generated code (using the .state16 or .state32 directives)
  • By entering an include file (specified by the .include or .copy directive)
  • By leaving an include file (specified by the .include or .copy directive)

Example 4-1 Local Labels of the Form $n

This is an example of code that declares and uses a local label legally:

Label1: CMP r1, #0 ; Compare r1 to zero. BCS $1 ; If carry is set, branch to $1; ADDS r0, r0, #1 ; else increment to r0 MOVCS pc, lr ; and return. $1: LDR r2, [r5], #4 ; Load indirect of r5 into r2 ; with write back. .newblock ; Undefine $1 so it can be used ; again. ADDS r1, r1, r2 ; Add r2 to r1. BPL $1 ; If the negative bit isn't set, ; branch to $1; MVNS r1, r1 ; else negate r1. $1: MOV pc, lr ; Return.

The following code uses a local label illegally:

BCS $1 ; If carry is set, branch to $1; ADDS r0, r0, #1 ; else increment to r0 MOVCS pc, lr ; and return. $1: LDR r2, [r5], #4 ; Load indirect of r5 into r2 ; with write-back. ADDS r1, r1, r2 ; Add r2 to r1. BPL $1 ; If the negative bit isn't set, ; branch to $1; MVNS r1, r1 ; else negate r1. $1: MOV pc, lr ; Return.

The $1 label is not undefined before being reused by the second branch instruction. Therefore, $1 is redefined, which is illegal.

Local labels are especially useful in macros. If a macro contains a normal label and is called more than once, the assembler issues a multiple-definition error. If you use a local label and .newblock within a macro, however, the local label is used and reset each time the macro is expanded.

Up to ten local labels of the $n form can be in effect at one time. Local labels of the form name? are not limited. After you undefine a local label, you can define it and use it again. Local labels do not appear in the object code symbol table.

For more information about using labels in macros see Section 6.6.

Symbolic Constants

A symbolic constant is a symbol with a value that is an absolute constant expression (see Section 4.9). By using symbolic constants, you can assign meaningful names to constant expressions. The .set and .struct/.tag/.endstruct directives enable you to set symbolic constants (see Define Assembly-Time Constant). Once defined, symbolic constants cannot be redefined.

If you use the .set directive to assign a value to a symbol , the symbol becomes a symbolic constant and may be used where a constant expression is expected. For example:

shift3 .set 3 MOV R0, #shift3

You can also use the .set directive to assign symbolic constants for other symbols, such as register names. In this case, the symbolic constant becomes a synonym for the register:

AuxR1 .set R1 LDR AuxR1, [SP]

The following example shows how the .set directive can be used with the .struct, .tag. and .endstruct directives. It creates the symbolic constants K, maxbuf, item, value, delta, and i_len.

K .set 1024 ;constant definitions maxbuf .set 2*K item .struct ;item structure definition .int value ;constant offsets value = 0 .int delta ;constant offsets delta = 1 i_len .endstruct array .tag item ;array declaration .bss array, i_len*K

The assembler also has many predefined symbolic constants; these are discussed in Section 4.8.6.

Defining Symbolic Constants (--asm_define Option)

The --asm_define option equates a constant value or a string with a symbol. The symbol can then be used in place of a value in assembly source. The format of the --asm_define option is as follows:

armcl --asm_define=name[=value]

The name is the name of the symbol you want to define. The value is the constant or string value you want to assign to the symbol. If the value is omitted, the symbol is set to 1. If you want to define a quoted string and keep the quotation marks, do one of the following:

  • For Windows, use --asm_define=name="\"value\"". For example, --asm_define=car="\"sedan\""
  • For UNIX, use --asm_define=name='"value"'. For example, --asm_define=car='"sedan"'
  • For Code Composer, enter the definition in a file and include that file with the --cmd_file (or -@) option.

Once you have defined the name with the --asm_define option, the symbol can be used with assembly directives and instructions as if it had been defined with the .set directive. For example, on the command line you enter:

armcl --asm_define=SYM1=1 --asm_define=SYM2=2 --asm_define=SYM3=3 --asm_define=SYM4=4 value.asm

Since you have assigned values to SYM1, SYM2, SYM3, and SYM4, you can use them in source code. Example 4-2 shows how the value.asm file uses these symbols without defining them explicitly.

Within assembler source, you can test the symbol defined with the --asm_define option with these directives:

Type of Test Directive Usage
Existence .if $$isdefed("name")
Nonexistence .if $$isdefed("name") = 0
Equal to value .ifname=value
Not equal to value .if name!=value

The argument to the $$isdefed built-in function must be enclosed in quotes. The quotes cause the argument to be interpreted literally rather than as a substitution symbol.

Example 4-2 Using Symbolic Constants Defined on Command Line

IF_4: .if SYM4 = SYM2 * SYM2 .byte SYM4 ; Equal values .else .byte SYM2 * SYM2 ; Unequal values .endif IF_5: .if SYM1 <= 10 .byte 10 ; Less than / equal .else .byte SYM1 ; Greater than .endif IF_6: .if SYM3 * SYM2 != SYM4 + SYM2 .byte SYM3 * SYM2 ; Unequal value .else .byte SYM4 + SYM4 ; Equal values .endif IF_7: .if SYM1 = SYM2 .byte SYM1 .elseif SYM2 + SYM3 = 5 .byte SYM2 + SYM3 .endif

Predefined Symbolic Constants

The assembler has several types of predefined symbols.

$, the dollar-sign character, represents the current value of the section program counter (SPC).

In addition, the following predefined processor symbolic constants are available:

Table 4-2 ARM Processor Symbolic Constants

Macro Name Description
.TI_ARM Always set to 1
.TI_ARM_16BIS Set to 1 if the default state is 16 bit Thumb mode (the --code_state=16 option is used for an ARMv6 or prior architecture); otherwise, set to 0.
.TI_ARM_32BIS Set to 1 if the default state is 32 bit (the --code_state=16 option is not used or the --code_state=32 option is used); otherwise, set to 0.
.TI_ARM_T2IS Set to 1 if the default state is Thumb-2 mode (the --code_state=16 option is used for an ARMv7 or higher architecture); otherwise set to 0.
.TI_ARM_LITTLE Set to 1 if little-endian mode is selected (the --endian assembler option is used); otherwise, set to 0.
.TI_ARM_BIG Set to 1 if big-endian mode is selected (the --endian assembler option is not used); otherwise, set to 0.
_ _TI_ARM7ABI_ASSEMBLER Set to 1 if the TI ARM7 ABI is enabled (the --abi=tiabi option is used); otherwise, it is set to 0. (This option is deprecated.)
_ _TI_ARM9ABI_ASSEMBLER Set to 1 if the TI ARM9 ABI is enabled (the --abi=ti_arm9_abi option is used); otherwise, it is set to 0. (This option is deprecated.)
_ _TI_EABI_ASSEMBLER Set to 1 if the EABI ABI is enabled. EABI is now the only supported ABI; see Section 4.4.
_ _TI_NEON_SUPPORT_ _ Set to 1 if NEON SIMD extension is targeted (the --neon option is used); otherwise, it is set to 0.
_ _TI_ARM_V4_ _ Set to 1 if the v4 architecture (ARM7) is targeted (the -mv4 option is used); otherwise, it is set to 0.
_ _TI_ARM_V5E_ _ Set to 1 if the v5E architecture (ARM9E) is targeted (the -mv5e option is used); otherwise, it is set to 0.
_ _TI_ARM_V6_ _ Set to 1 if the v6 architecture (ARM11) is targeted (the -mv6 option is used); otherwise, it is set to 0.
_ _TI_ARM_V6M0_ _ Set to 1 if the v6M0 architecture (Cortex-M0) is targeted (the -mv6M0 option is used); otherwise, it is set to 0.
_ _TI_ARM_V7_ _ Set to 1 if any v7 architecture (Cortex) is targeted; otherwise, it is set to 0.
_ _TI_ARM_V7A8_ _ Set to 1 if the v7A8 architecture (Cortex-A8) is targeted (the -mv7A8 option is used); otherwise, it is set to 0.
_ _TI_ARM_V7M3_ _ Set to 1 if the v7M3 architecture (Cortex-M3) is targeted (the -mv7M3 option is used); otherwise, it is set to 0.
_ _TI_ARM_V7M4_ _ Set to 1 if the v7M4 architecture (Cortex-M4) is targeted (the -mv7M4 option is used); otherwise, it is set to 0.
_ _TI_ARM_V7R4_ _ Set to 1 if the v7R4 architecture (Cortex-R4) is targeted (the -mv7R4 option is used); otherwise, it is set to 0.
_ _TI_VFP_SUPPORT_ _ Set to 1 if the VFP coprocessor is enabled (any --float_support option is used); otherwise, it is set to 0.
_ _TI_VFPV3_SUPPORT_ _ Set to 1 if the VFP coprocessor is enabled (the --float_support=vfpv3 option is used); otherwise, it is set to 0.
_ _TI_VFPV3D16_SUPPORT_ _ Set to 1 if the VFP coprocessor is enabled (the --float_support=vfpv3d16 option is used); otherwise, it is set to 0.
_ _TI_FPV4SPD16_SUPPORT_ _ Set to 1 if the FP coprocessor is enabled (the --float_support=fpv4spd16 option is used); otherwise, it is set to 0.

Registers

In addition, control register names are predefined symbols.

The names of ARM registers and their aliases are register symbols, including:

  • Coprocessor registers, including C0-C15.
  • Coprocessor IDs, including P0-P15.
  • VFP registers, including D0-D31, S0-S31.
  • NEON registers, including D0-D31, Q0-Q15.

Table 4-3 ARM Register Symbols with Aliases

Register Name Alias Register Name Alias
R0 A1 R8 V5
R1 A2 R9 V6
R2 A3 R10 V7
R3 A4 R11 V8
R4 V1 R12 V9, IP
R5 V2 R13 SP
R6 V3 R14 LR
R7 V4, AP R15 PC

Register symbols and aliases can be entered as all uppercase or all lowercase characters. For example, R13 could also be entered as r13, SP, or sp.

Control register symbols can be entered in all upper-case or all lower-case characters.

See the "Register Conventions" section of the ARM Optimizing C/C++ Compiler User's Guide for details about the registers and their uses.

Status registers can be entered as all uppercase or all lowercase characters; that is, CPSR could also be entered as cpsr, CPSR_ALL, or cpsr_all.

Table 4-4 ARM Status Registers and Aliases

Register Alias Description
CPSR CPSR_ALL Current processor status register
CPSR_FLG Current processor status register flag bits only
SPSR SPSR_ALL Saved processor status register
SPSR_FLG Saved processor status register flag bits only

Substitution Symbols

Symbols can be assigned a string value. This enables you to create aliases for character strings by equating them to symbolic names. Symbols that represent character strings are called substitution symbols. When the assembler encounters a substitution symbol, its string value is substituted for the symbol name. Unlike symbolic constants, substitution symbols can be redefined.

A string can be assigned to a substitution symbol anywhere within a program; for example:

.asg "SP", stack-pointer ; Assigns the string SP to the substitution symbol stack-pointer. .asg "#0x20", block2 ; Assigns the string #0x20 to the substitution symbol block2. ADD stack-pointer, stack-pointer, block2 ; Adds the value in SP to #0x20 and stores the result in SP.

When you are using macros, substitution symbols are important because macro parameters are actually substitution symbols that are assigned a macro argument. The following code shows how substitution symbols are used in macros:

addl .macro dest, src ; addl macro definition ADDS dest, dest, src ; Add the value in register dest to the value in register src, ; and store the result in src. BLCS reset_ctr ; Handle overflow. .endm *addl invocation addl R4, R5 ; Calls the macro addl and substitutes R4 for dest and R5 for src. ; The macro adds the value of R4 and the value of R5, stores the ; result in R4, and handles overflow.

See Section 6 for more information about macros.

Expressions

Nearly all values and operands in assembly language are expressions, which may be any of the following:

  • a literal constant
  • a register
  • a register pair
  • a memory reference
  • a symbol
  • a built-in function invocation
  • a mathematical or logical operation on one or more expressions

This section defines several types of expressions that are referred to throughout this document. Some instruction operands accept limited types of expressions. For example, the .if directive requires its operand be an absolute constant expression with an integer value. Absolute in the context of assembly code means that the value of the expression must be known at assembly time.

A constant expression is any expression that does not in any way refer to a register or memory reference. An immediate operand will usually not accept a register or memory reference. It must be given a constant expression. Constant expressions may be any of the following:

  • a literal constant
  • an address constant expression
  • a symbol whose value is a constant expression
  • a built-in function invocation on a constant expression
  • a mathematical or logical operation on one or more constant expressions

An address constant expression is a special case of a constant expression. Some immediate operands that require an address value can accept a symbol plus an addend; for example, some branch instructions. The symbol must have a value that is an address, and it may be an external symbol. The addend must be an absolute constant expression with an integer value. For example, a valid address constant expression is "array+4".

A constant expression may be absolute or relocatable. Absolute means known at assembly time. Relocatable means constant, but not known until link time. External symbols are relocatable, even if they refer to a symbol defined in the same module.

An absolute constant expression may not refer to any external symbols anywhere in the expression. In other words, an absolute constant expression may be any of the following:

  • a literal constant
  • an absolute address constant expression
  • a symbol whose value is an absolute constant expression
  • a built-in function invocation whose arguments are all absolute constant expressions
  • a mathematical or logical operation on one or more absolute constant expressions

A relocatable constant expression refers to at least one external symbol. For ELF, such expressions may contain at most one external symbol. A relocatable constant expression may be any of the following:

  • an external symbol
  • a relocatable address constant expression
  • a symbol whose value is a relocatable constant expression
  • a built-in function invocation with any arguments that are relocatable constant expressions
  • a mathematical or logical operation on one or more expressions, at least one of which is a relocatable constant expression

In some cases, the value of a relocatable address expression may be known at assembly time. For example, a relative displacement branch may branch to a label defined in the same section.

Mathematical and Logical Operators

The operands of a mathematical or logical operator must be well-defined expressions. That is, you must use the correct number of operands and the operation must make sense. For example, you cannot take the XOR of a floating-point value. In addition, well-defined expressions contain only symbols or assembly-time constants that have been defined before they occur in the directive's expression.

Three main factors influence the order of expression evaluation:

Parentheses Expressions enclosed in parentheses are always evaluated first.

8 / (4 / 2) = 4, but 8 / 4 / 2 = 1

You cannot substitute braces ( { } ) or brackets ( [ ] ) for parentheses.

Precedence groups Operators, listed in Table 4-5, are divided into nine precedence groups. When parentheses do not determine the order of expression evaluation, the highest precedence operation is evaluated first.

8 + 4 / 2 = 10 (4 / 2 is evaluated first)

Left-to-right evaluation When parentheses and precedence groups do not determine the order of expression evaluation, the expressions are evaluated from left to right, except for Group 1, which is evaluated from right to left.

8 / 4*2 = 4, but 8 / (4*2) = 1

Table 4-5 lists the operators that can be used in expressions, according to precedence group.

Table 4-5 Operators Used in Expressions (Precedence)

Group(1) Operator Description(2)
1 +
-
~
!
Unary plus
Unary minus
1s complement
Logical NOT
2 *
/
%
Multiplication
Division
Modulo
3 +
-
Addition
Subtraction
4 <<
>>
Shift left
Shift right
5 <
<=
>
>=
Less than
Less than or equal to
Greater than
Greater than or equal to
6 =[=]
!=
Equal to
Not equal to
7 & Bitwise AND
8 ^ Bitwise exclusive OR (XOR)
9 | Bitwise OR
Group 1 operators are evaluated right to left. All other operators are evaluated left to right.
Unary + and - have higher precedence than the binary forms.

The assembler checks for overflow and underflow conditions when arithmetic operations are performed during assembly. It issues a warning (the "value truncated" message) whenever an overflow or underflow occurs. The assembler does not check for overflow or underflow in multiplication.

Relational Operators and Conditional Expressions

The assembler supports relational operators that can be used in any expression; they are especially useful for conditional assembly. Relational operators include the following:

= Equal to

! =

Not equal to
< Less than <= Less than or equal to
> Greater than > = Greater than or equal to

Conditional expressions evaluate to 1 if true and 0 if false and can be used only on operands of equivalent types; for example, absolute value compared to absolute value, but not absolute value compared to relocatable value.

Well-Defined Expressions

Some assembler directives, such as .if, require well-defined absolute constant expressions as operands. Well-defined expressions contain only symbols or assembly-time constants that have been defined before they occur in the directive's expression. In addition, they must use the correct number of operands and the operation must make sense. The evaluation of a well-defined expression must be unambiguous.

This is an example of a well-defined expression:

1000h+X

where X was previously defined as an absolute symbol.

Relocatable Symbols and Legal Expressions

All legal expressions can be reduced to one of two forms:

relocatable symbol ± absolute symbol

or

absolute value

Unary operators can be applied only to absolute values; they cannot be applied to relocatable symbols. Expressions that cannot be reduced to contain only one relocatable symbol are illegal.

Table 4-6 summarizes valid operations on absolute, relocatable, and external symbols. An expression cannot contain multiplication or division by a relocatable or external symbol. An expression cannot contain unresolved symbols that are relocatable to other sections.

Symbols that have been defined as global with the .global directive can also be used in expressions; in Table 4-6, these symbols are referred to as external.

Table 4-6 Expressions With Absolute and Relocatable Symbols

If A is... and If B is... , then A + B is... and A - B is...
absolute absolute absolute absolute
absolute relocatable relocatable illegal
absolute external external illegal
relocatable absolute relocatable relocatable
relocatable relocatable illegal absolute(1)
relocatable external illegal illegal
external absolute external external
external relocatable illegal illegal
external external illegal illegal
A and B must be in the same section; otherwise, adding relocatable symbols to relocatable symbols is illegal.

Expression Examples

Following are examples of expressions that use relocatable and absolute symbols. These examples use four symbols that are defined in the same section:

.global extern_1 ; Defined in an external module intern_1: .word '"D' ; Relocatable, defined in current ; module LAB1: .set 2 ; LAB1 = 2 intern_2 ; Relocatable, defined in current ; module intern_3 ; Relocatable, defined in current ; module
  • Example 1
  • The statements in this example use an absolute symbol, LAB1, which is defined to have a value of 2. The first statement loads the value 51 into R0. The second statement loads the value 27 into R0.

    MOV R0, #LAB1 + ((4+3) * 7) ; R0 = 51 ; 2 + ((7) * 7) ; 2 + (49) = 51 MOV R0, #LAB1 + 4 + (3*7) ; R0 = 27 ; 2 + 4 + (21) = 27
  • Example 2
  • The first statement in the following example is valid; the statements that follow it are invalid.

    LDR R1, intern_1 - 10 ; Legal LDR R1, 10-intern_1 ; Can't negate reloc. symbol LDR R1, -(intern_1) ; Can't negate reloc. symbol LDR R1, intern_1/10 ; / isn't additive operator LDR R1, intern_1 + intern_2 ; Multiple relocatables
  • Example 3
  • The first statement below is legal; although intern_1 and intern_2 are relocatable, their difference is absolute because they are in the same section. Subtracting one relocatable symbol from another reduces the expression to relocatable symbol + absolute value. The second statement is illegal because the sum of two relocatable symbols is not an absolute value.

    LDR R1, intern_1 - intern_2 + intern_3 ; Legal LDR R1, intern_1 + intern_2 + intern_3 ; Illegal
  • Example 4
  • A relocatable symbol's placement in the expression is important to expression evaluation. Although the statement below is similar to the first statement in the previous example, it is illegal because of left-to-right operator precedence; the assembler attempts to add intern_1 to extern_3.

    LDR R1, intern_1 + intern_3 - intern_2 ; Illegal

Built-in Functions and Operators

The assembler supports built-in mathematical functions and built-in addressing operators.

The built-in substitution symbol functions are discussed in Section 6.3.2.

Built-In Math and Trigonometric Functions

The assembler supports built-in functions for conversions and various math computations. Table 4-7 describes the built-in functions. The expr must be a constant value.

Table 4-7 Built-In Mathematical Functions

Function Description
$$acos(expr) Returns the arccosine of expr as a floating-point value
$$asin(expr) Returns the arcsine of expr as a floating-point value
$$atan(expr) Returns the arctangent of expr as a floating-point value
$$atan2(expr, y) Returns the arctangent of expr as a floating-point value in range [-π, π]
$$ceil(expr) Returns the smallest integer not less than expr
$$cos(expr) Returns the cosine of expr as a floating-point value
$$cosh(expr) Returns the hyperbolic cosine of expr as a floating-point value
$$cvf(expr) Converts expr to a floating-point value
$$cvi(expr) converts expr to integer value
$$exp(expr) Returns the exponential function e expr
$$fabs(expr) Returns the absolute value of expr as a floating-point value
$$floor(expr) Returns the largest integer not greater than expr
$$fmod(expr, y) Returns the remainder of expr1 ÷ expr2
$$int(expr) Returns 1 if expr has an integer value; else returns 0. Returns an integer.
$$ldexp(expr, expr2) Multiplies expr by an integer power of 2. That is, expr1 × 2 expr2
$$log(expr) Returns the natural logarithm of expr, where expr>0
$$log10(expr) Returns the base 10 logarithm of expr, where expr>0
$$max(expr1, expr2) Returns the maximum of two values
$$min(expr1, expr2) Returns the minimum of two values
$$pow(expr1, expr2) Returns expr1raised to the power of expr2
$$round(expr) Returns expr rounded to the nearest integer
$$sgn(expr) Returns the sign of expr.
$$sin(expr) Returns the sine of expr
$$sinh(expr) Returns the hyperbolic sine of expr as a floating-point value
$$sqrt(expr) Returns the square root of expr, expr≥0, as a floating-point value
$$strtod(str) Converts a character string to a double precision floating-point value. The string contains a properly-formatted C99-style floating-point literal.
$$tan(expr) Returns the tangent of expr as a floating-point value
$$tanh(expr) Returns the hyperbolic tangent of expr as a floating-point value
$$trunc(expr) Returns expr rounded toward 0

Unified Assembly Language Syntax Support

Unified assembly language (UAL) is the new assembly syntax introduced by ARM Ltd. to handle the ambiguities introduced by the original Thumb-2 assembly syntax and provide similar syntax for ARM, Thumb and Thumb-2. UAL is backwards compatible with old ARM assembly, but incompatible with the previous Thumb assembly syntax.

UAL syntax is the default assembly syntax beginning with ARMv7 architectures. When writing assembly code, the .arm and .thumb directives are used to specify ARM and Thumb UAL syntax, respectively. The .state32 and .state16 directives remain to specify non-UAL ARM and Thumb syntax. The .arm and .state32 directives are equivalent since UAL syntax is backwards compatible in ARM mode. Since non-UAL syntax is not supported for Thumb-2 instructions, Thumb-2 instructions cannot be used inside of a .state16 section. However, assembly code with .state16 sections that contain only non-UAL Thumb code can be assembled for ARMv7 architectures to allow easy porting of older code.

See Section 5.3 for more information about the .state16, .state32, .arm, and .thumb directives.

A full description of the UAL syntax can be found in the ARM Ltd. documentation, but there are a few key differences related to Thumb-2 syntax:

  • The .W extension is used to indicate that an instruction should be encoded in a 32-bit form. A .N extension is used to indicate that an instruction should be encoded in a 16-bit form; the assembler reports an error if this is not possible. If no extension is used then the assembler uses a 16-bit encoding whenever possible.
  • 16-bit Thumb ALU instructions that set status indicate this with a syntax that has a 'S' modifier. This is the same as how ARM ALU instructions that set status have always been handled.

Source Listings

A source listing shows source statements and the object code they produce. To obtain a listing file, invoke the assembler with the --asm_listing option (see Section 4.3).

Two banner lines, a blank line, and a title line are at the top of each source listing page. Any title supplied by the .title directive is printed on the title line. A page number is printed to the right of the title. If you do not use the .title directive, the name of the source file is printed. The assembler inserts a blank line below the title line.

Each line in the source file produces at least one line in the listing file. This line shows a source statement number, an SPC value, the object code assembled, and the source statement. Figure 4-2 shows these in an actual listing file.

Field 1: Source Statement Number

Line number

The source statement number is a decimal number. The assembler numbers source lines as it encounters them in the source file; some statements increment the line counter but are not listed. (For example, .title statements and statements following a .nolist are not listed.) The difference between two consecutive source line numbers indicates the number of intervening statements in the source file that are not listed.

Include file letter

A letter preceding the line number indicates the line is assembled from the include file designated by the letter.

Nesting level number

A number preceding the line number indicates the nesting level of macro expansions or loop blocks.

Field 2: Section Program Counter

This field contains the SPC value, which is hexadecimal. All sections (.text, .data, .bss, and named sections) maintain separate SPCs. Some directives do not affect the SPC and leave this field blank.

Field 3: Object Code

This field contains the hexadecimal representation of the object code. All machine instructions and directives use this field to list object code. This field also indicates the relocation type associated with an operand for this line of source code. If more than one operand is relocatable, this column indicates the relocation type for the first operand. The characters that can appear in this column and their associated relocation types are listed below:

! undefined external reference
' .text relocatable
+ .sect relocatable
" .data relocatable
- .bss, .usect relocatable
% relocation expression

Field 4: Source Statement Field

This field contains the characters of the source statement as they were scanned by the assembler. The assembler accepts a maximum line length of 200 characters. Spacing in this field is determined by the spacing in the source statement.

Figure 4-2 shows an assembler listing with each of the four fields identified.

Figure 4-2 Example Assembler Listing assylist_1_pnu118.gif

Figure 4-2. Example Assembler Listing (Continued)

assylist_2_pnu118.gif

Debugging Assembly Source

By default, when you compile an assembly file, the assembler provides symbolic debugging information that allows you to step through your assembly code in a debugger rather than using the Disassembly window in Code Composer Studio. This enables you to view source comments and other source-code annotations while debugging. The default has the same behavior as using the --symdebug:dwarf option. You can disable the generation of debugging information by using the --symdebug:none option.

The .asmfunc and .endasmfunc (see .asmfunc directive) directives enable you to use C characteristics in assembly code that makes the process of debugging an assembly file more closely resemble debugging a C/C++ source file.

The .asmfunc and .endasmfunc directives allow you to name certain areas of your code, and make these areas appear in the debugger as C functions. Contiguous sections of assembly code that are not enclosed by the .asmfunc and .endasmfunc directives are automatically placed in assembler-defined functions named with this syntax:

$filename:starting source line:ending source line$

If you want to view your variables as a user-defined type in C code, the types must be declared and the variables must be defined in a C file. This C file can then be referenced in assembly code using the .ref directive (see .ref directive). Example 4-3 shows the cvar.c C program that defines a variable, svar, as the structure type X. The svar variable is then referenced in the addfive.asm assembly program in Example 4-4 and 5 is added to svar's second data member.

Compile both source files with the --symdebug:dwarf option (-g) and link them as follows:

armcl --symdebug:dwarf cvars.c addfive.asm --run_linker --library=lnk.cmd --library=rtsv4_A_be_eabi.lib --output_file=addfive.out

When you load this program into a symbolic debugger, addfive appears as a C function. You can monitor the values in svar while stepping through main just as you would any regular C variable.

Example 4-3 Viewing Assembly Variables as C Types C Program

typedef struct { int m1; int m2; } X; X svar = { 1, 2 };

Example 4-4 Assembly Program for Example 4-3

;------------------------------------------------------------------------------ ; Tell the assembler we're referencing variable "_svar", which is defined in ; another file (cvars.c). ;------------------------------------------------------------------------------ .ref _svar ;------------------------------------------------------------------------------ ; addfive() - Add five to the second data member of _svar ;------------------------------------------------------------------------------ .text .global addfive addfive: .asmfunc LDW .D2T2 *+B14(_svar+4),B4 ; load svar.m2 into B4 RET .S2 B3 ; return from function NOP 3 ; delay slots 1-3 ADD .D2 5,B4,B4 ; add 5 to B4 (delay slot 4) STW .D2T2 B4,*+B14(_svar+4) ; store B4 back into svar.m2 ; (delay slot 5) .endasmfunc

Cross-Reference Listings

A cross-reference listing shows symbols and their definitions. To obtain a cross-reference listing, invoke the assembler with the --asm_listing_cross_reference option (see Section 4.3) or use the .option directive with the X operand (see Select Listing Options). The assembler appends the cross-reference to the end of the source listing. Example 4-5 shows the four fields contained in the cross-reference listing.

Example 4-5 An Assembler Cross-Reference Listing

LABEL VALUE -DEFN REF .TI_ARM 00000001 0 .TI_ARM_16BIS 00000000 0 .TI_ARM_32BIS 00000001 0 .TI_ARM_BIG 00000001 0 .TI_ARM_LITTLE 00000000 0 .ti_arm 00000001 0 .ti_arm_16bis 00000000 0 .ti_arm_32bis 00000001 0 .ti_arm_big 00000001 0 .ti_arm_little 00000000 0 STACKSIZE 00000200 9 10 63 __stack 00000000- 10 5 62 dispatch REF 29 60 reset 00000000' 34 16 19 30 stack 00000024' 62 52 stacksz 00000028' 63 54
Label column contains each symbol that was defined or referenced during the assembly.
Value column contains an 8-digit hexadecimal number (which is the value assigned to the symbol) or a name that describes the symbol's attributes. A value may also be preceded by a character that describes the symbol's attributes. Table 4-8 lists these characters and names.
Definition (DEFN) column contains the statement number that defines the symbol. This column is blank for undefined symbols.
Reference (REF) column lists the line numbers of statements that reference the symbol. A blank in this column indicates that the symbol was never used.

Table 4-8 Symbol Attributes

Character or Name Meaning
REF External reference (global symbol)
UNDF Undefined
' Symbol defined in a .text section
" Symbol defined in a .data section
+ Symbol defined in a .sect section
- Symbol defined in a .bss or .usect section
Back to Top

Submit Documentation Feedback

Copyright© 2016, Texas Instruments Incorporated. An IMPORTANT NOTICE for this document addresses availability, warranty, changes, use in safety-critical applications, intellectual property matters and other important disclaimers.