ARM Assembly
Language Tools
v15.9.0.STS User's Guide
SPNU118 - REVISED SEPTEMBER, 2015
2 Introduction to Object Modules
The assembler creates object modules from assembly code, and the linker creates executable object files from object modules. These executable object files can be executed by an ARM device.
Object modules make modular programming easier because they encourage you to think in terms of blocks of code and data when you write an assembly language program. These blocks are known as sections. Both the assembler and the linker provide directives that allow you to create and manipulate sections.
This chapter focuses on the concept and use of sections in assembly language programs.
2.1 Object File Format Specifications
The object files created by the assembler and linker conform to the ELF (Executable and Linking Format) binary format, which is used by the Embedded Application Binary Interface (EABI). See the ARM Optimizing C/C++ Compiler User's Guide (SPNU151) for information on the EABI ABI. The complete ARM ABI specifications can be found in the ARM Information Center.
COFF object files and the legacy TIABI and TI ARM9 ABI modes are not supported in v15.6.0.STS and later versions of the TI Code Generation Tools. If you would like to produce COFF output files, please use v5.2 of the ARM Code Generation Tools and refer to SPNU151J for documentation.
The ELF object files generated by the assembler and linker conform to the December 17, 2003 snapshot of the System V generic ABI (or gABI). This specification is currently maintained by SCO.
2.2 Executable Object Files
The linker produces executable object modules. An executable object module has the same format as object files that are used as linker input. The sections in an executable object module, however, have been combined and placed in target memory, and the relocations are all resolved.
To run a program, the data in the executable object module must be transferred, or loaded, into target system memory. See Section 3 for details about loading and running programs.
2.3 Introduction to Sections
The smallest unit of an object file is a section. A section is a block of code or data that occupies contiguous space in the memory map. Each section of an object file is separate and distinct.
ELF format executable object files contain segments. An ELF segment is a meta-section. It represents a contiguous region of target memory. It is a collection of sections that have the same property, such as writeable or readable. An ELF loader needs the segment information, but does not need the section information. The ELF standard allows the linker to omit ELF section information entirely from the executable object file.
Object files usually contain three default sections:
.text section | contains executable code (1) Some targets allow content other than text, such as constants, in .text sections. |
.data section | usually contains initialized data |
.bss section | usually reserves space for uninitialized variables |
The assembler and linker allow you to create, name, and link other kinds of sections. The .text, .data, and .bss sections are archetypes for how sections are handled.
There are two basic types of sections:
Initialized sections | contain data or code. The .text and .data sections are initialized; user-named sections created with the .sect assembler directive are also initialized. |
Uninitialized sections | reserve space in the memory map for uninitialized data. The .bss section is uninitialized; user-named sections created with the .usect assembler directive are also uninitialized. |
Several assembler directives allow you to associate various portions of code and data with the appropriate sections. The assembler builds these sections during the assembly process, creating an object file organized as shown in Figure 2-1.
One of the linker's functions is to relocate sections into the target system's memory map; this function is called placement. Because most systems contain several types of memory, using sections can help you use target memory more efficiently. All sections are independently relocatable; you can place any section into any allocated block of target memory. For example, you can define a section that contains an initialization routine and then allocate the routine in a portion of the memory map that contains ROM. For information on section placement, see the "Specifying Where to Allocate Sections in Memory" section of the ARM Optimizing C/C++ Compiler User's Guide.
Figure 2-1 shows the relationship between sections in an object file and a hypothetical target memory.
2.3.1 Special Section Names
You can use the .sect and .usect directives to create any section name you like, but certain sections are treated in a special manner by the linker and the compiler's run-time support library. If you create a section with the same name as a special section, you should take care to follow the rules for that special section.
A few common special sections are:
- .text -- Used for program code.
- .bss -- Used for uninitialized objects (global variables).
- .data -- Used for initialized non-const objects (global variables).
- .const -- Used for initialized const objects (string constants, variables declared const).
- .cinit -- Used to initialize C global variables at startup.
- .stack -- Used for the function call stack.
- .sysmem - Used for the dynamic memory allocation pool.
For more information on sections, see the "Specifying Where to Allocate Sections in Memory" section of the ARM Optimizing C/C++ Compiler User's Guide.
2.4 How the Assembler Handles Sections
The assembler identifies the portions of an assembly language program that belong in a given section. The assembler has the following directives that support this function:
- .bss
- .data
- .sect
- .text
- .usect
The .bss and .usect directives createuninitialized sections; the .text, .data, and .sect directives create initialized sections.
You can create subsections of any section to give you tighter control of the memory map. Subsections are created using the .sect and .usect directives. Subsections are identified with the base section name and a subsection name separated by a colon; see Section 2.4.6.
NOTE
Default Sections DirectiveIf you do not use any of the sections directives, the assembler assembles everything into the .text section.
2.4.1 Uninitialized Sections
Uninitialized sections reserve space in ARM memory; they are usually placed in RAM. These sections have no actual contents in the object file; they simply reserve memory. A program can use this space at run time for creating and storing variables.
Uninitialized data areas are built by using the following assembler directives.
- The .bss directive reserves space in the .bss section.
- The .usect directive reserves space in a specific uninitialized user-named section.
Each time you invoke the .bss or .usect directive, the assembler reserves additional space in the .bss or the user-named section. The syntax is:
.bss symbol, size in bytes[, alignment [, bank offset] ] | |
symbol | .usect "section name", size in bytes[, alignment[, bank offset] ] |
symbol | points to the first byte reserved by this invocation of the .bss or .usect directive. The symbol corresponds to the name of the variable that you are reserving space for. It can be referenced by any other section and can also be declared as a global symbol (with the .global directive). |
size in bytes | is an absolute expression (see Section 4.9). The .bss directive reserves size in bytes bytes in the .bss section. The .usect directive reserves size in bytes bytes in section name. For both directives, you must specify a size; there is no default value. |
alignment | is an optional parameter. It specifies the minimum alignment in bytes required by the space allocated. The default value is byte aligned; this option is represented by the value 1. The value must be a power of 2. |
bank offset | is an optional parameter. It ensures that the space allocated to the symbol occurs on a specific memory bank boundary. The bank offset measures the number of bytes to offset from the alignment specified before assigning the symbol to that location. |
section name | specifies the user-named section in which to reserve space. See Section 2.4.3. |
Initialized section directives (.text, .data, and .sect) change which section is considered the current section (see Section 2.4.2). However, the .bss and .usect directives do not change the current section; they simply escape from the current section temporarily. Immediately after a .bss or .usect directive, the assembler resumes assembling into whatever the current section was before the directive. The .bss and .usect directives can appear anywhere in an initialized section without affecting its contents. For an example, see Section 2.4.7.
The .usect directive can also be used to create uninitialized subsections. See Section 2.4.6 for more information on creating subsections.
The .common directive is similar to directives that create uninitialized data sections, except that common symbols are created, instead.
2.4.2 Initialized Sections
Initialized sections contain executable code or initialized data. The contents of these sections are stored in the object file and placed in ARM memory when the program is loaded. Each initialized section is independently relocatable and may reference symbols that are defined in other sections. The linker automatically resolves these references. The following directives tell the assembler to place code or data into a section. The syntaxes for these directives are:
.text | |
.data | |
.sect "section name" |
The .sect directive can also be used to create initialized subsections. See Section 2.4.6, for more information on creating subsections.
2.4.3 User-Named Sections
User-named sections are sections that you create. You can use them like the default .text, .data, and .bss sections, but each section with a distinct name is kept distinct during assembly.
For example, repeated use of the .text directive builds up a single .text section in the object file. This .text section is allocated in memory as a single unit. Suppose there is a portion of executable code (perhaps an initialization routine) that you want the linker to place in a different location than the rest of .text. If you assemble this segment of code into a user-named section, it is assembled separately from .text, and you can use the linker to allocate it into memory separately. You can also assemble initialized data that is separate from the .data section, and you can reserve space for uninitialized variables that is separate from the .bss section.
These directives let you create user-named sections:
- The .usect directive creates uninitialized sections that are used like the .bss section. These sections reserve space in RAM for variables.
- The .sect directive creates initialized sections, like the default .text and .data sections, that can contain code or data. The .sect directive creates user-named sections with relocatable addresses.
The syntaxes for these directives are:
symbol | .usect "section name", size in bytes[, alignment[, bank offset]] |
.sect "section name" |
The maximum number of sections is 232-1 (4294967295).
The section name parameter is the name of the section. For the .usect and .sect directives, a section name can refer to a subsection; see Section 2.4.6 for details.
Each time you invoke one of these directives with a new name, you create a new user-named section. Each time you invoke one of these directives with a name that was already used, the assembler resumes assembling code or data (or reserves space) into the section with that name. You cannot use the same names with different directives. That is, you cannot create a section with the .usect directive and then try to use the same section with .sect.
2.4.4 Current Section
The assembler adds code or data to one section at a time. The section the assembler is currently filling is the current section. The .text, .data, and .sect directives change which section is considered the current section. When the assembler encounters one of these directives, it stops assembling into the current section (acting as an implied end of current section command). The assembler sets the designated section as the current section and assembles subsequent code into the designated section until it encounters another .text, .data, or .sect directive.
If one of these directives sets the current section to a section that already has code or data in it from earlier in the file, the assembler resumes adding to the end of that section. The assembler generates only one contiguous section for each given section name. This section is formed by concatenating all of the code or data which was placed in that section.
2.4.5 Section Program Counters
The assembler maintains a separate program counter for each section. These program counters are known as section program counters, or SPCs.
An SPC represents the current address within a section of code or data. Initially, the assembler sets each SPC to 0. As the assembler fills a section with code or data, it increments the appropriate SPC. If you resume assembling into a section, the assembler remembers the appropriate SPC's previous value and continues incrementing the SPC from that value.
The assembler treats each section as if it began at address 0; the linker relocates the symbols in each section according to the final address of the section in which that symbol is defined. See Section 2.7 for information on relocation.
2.4.6 Subsections
A subsection is created by creating a section with a colon in its name. Subsections are logical subdivisions of larger sections. Subsections are themselves sections and can be manipulated by the assembler and linker.
The assembler has no concept of subsections; to the assembler, the colon in the name is not special. The subsection .text:rts would be considered completely unrelated to its parent section .text, and the assembler will not combine subsections with their parent sections.
Subsections are used to keep parts of a section as distinct sections so that they can be separately manipulated. For instance, by placing each function and object in a uniquely-named subsection, the linker gets a finer-grained view of the section for memory placement and unused-function elimination.
By default, when the linker sees a SECTION directive in the linker command file like ".text", it will gather .text and all subsections of .text into one large output section named ".text". You can instead use the SECTION directive to control the subsection independently. See Section 8.5.4.1 for an example.
You can create subsections in the same way you create other user-named sections: by using the .sect or .usect directive.
The syntaxes for a subsection name are:
symbol | .usect " section_name : subsection_name ", size in bytes[, alignment[, bank offset] ] |
.sect " section_name : subsection_name " |
A subsection is identified by the base section name followed by a colon and the name of the subsection. The subsection name may not contain any spaces.
A subsection can be allocated separately or grouped with other sections using the same base name. For example, you create a subsection called _func within the .text section:
.sect ".text:_func"
Using the linker's SECTIONS directive, you can allocate .text:_func separately, or with all the .text sections.
You can create two types of subsections:
- Initialized subsections are created using the .sect directive. See Section 2.4.2.
- Uninitialized subsections are created using the .usect directive. See Section 2.4.1.
Subsections are placed in the same manner as sections. See Section 8.5.4 for information on the SECTIONS directive.
2.4.7 Using Sections Directives
Figure 2-2 shows how you can build sections incrementally, using the sections directives to swap back and forth between the different sections. You can use sections directives to begin assembling into a section for the first time, or to continue assembling into a section that already contains code. In the latter case, the assembler simply appends the new code to the code that is already in the section.
The format in Figure 2-2 is a listing file. Figure 2-2 shows how the SPCs are modified during assembly. A line in a listing file has four fields:
Field 1 | contains the source code line counter. |
Field 2 | contains the section program counter. |
Field 3 | contains the object code. |
Field 4 | contains the original source statement. |
See Section 4.12 for more information on interpreting the fields in a source listing.
As Figure 2-3 shows, the file in Figure 2-2 creates five sections:
.text | contains six 32-bit words of object code. |
.data | contains seven words of initialized data. |
vectors | is a user-named section created with the .sect directive; it contains two words of initialized data. |
.bss | reserves ten bytes in memory. |
newvars | is a user-named section created with the .usect directive; it reserves eight bytes in memory. |
The second column shows the object code that is assembled into these sections; the first column shows the source statements that generated the object code.
2.5 How the Linker Handles Sections
The linker has two main functions related to sections. First, the linker uses the sections in object files as building blocks; it combines input sections to create output sections in an executable output module. Second, the linker chooses memory addresses for the output sections; this is called placement. Two linker directives support these functions:
- The MEMORY directive allows you to define the memory map of a target system. You can name portions of memory and specify their starting addresses and their lengths.
- The SECTIONS directive tells the linker how to combine input sections into output sections and where to place these output sections in memory.
Subsections let you manipulate the placement of sections with greater precision. You can specify the location of each subsection with the linker's SECTIONS directive. If you do not specify a subsection, the subsection is combined with the other sections with the same base section name. See Section 8.5.4.1.
It is not always necessary to use linker directives. If you do not use them, the linker uses the target processor's default placement algorithm described in Section 8.7. When you do use linker directives, you must specify them in a linker command file.
Refer to the following sections for more information about linker command files and linker directives:
- Section 8.5 , Linker Command Files
- Section 8.5.3 , The MEMORY Directive
- Section 8.5.4 , The SECTIONS Directive
- Section 8.7 , Default Placement Algorithm
2.5.1 Combining Input Sections
Figure 2-4 provides a simplified example of the process of linking two files together.
Note that this is a simplified example, so it does not show all the sections that will be created or the actual sequence of the sections. See Section 8.7 for the actual default memory placement map for ARM.
In Figure 2-4, file1.obj and file2.obj have been assembled to be used as linker input. Each contains the .text, .data, and .bss default sections; in addition, each contains a user-named section. The executable object module shows the combined sections. The linker combines the .text section from file1.obj and the .text section from file2.obj to form one .text section, then combines the two .data sections and the two .bss sections, and finally places the user-named sections at the end. The memory map shows the combined sections to be placed into memory.
2.5.2 Placing Sections
Figure 2-4 illustrates the linker's default method for combining sections. Sometimes you may not want to use the default setup. For example, you may not want all of the .text sections to be combined into a single .text section. Or you may want a user-named section placed where the .data section would normally be allocated. Most memory maps contain various types of memory (RAM, ROM, EPROM, FLASH, etc.) in varying amounts; you may want to place a section in a specific type of memory.
For further explanation of section placement within the memory map, see the discussions in Section 8.5.3 and Section 8.5.4. See Section 8.7 for the actual default memory allocation map for ARM.
2.6 Symbols
An object file contains a symbol table that stores information about external symbols in the object file. The linker uses this table when it performs relocation. See Section 2.7.
An object file symbol is a named 32-bit integer value, usually representing an address. A symbol can represent such things as the starting address of a function, variable, or section.
An object file symbol can also represent an absolute integer, such as the size of the stack. To the linker, this integer is an unsigned value, but the integer may be treated as signed or unsigned depending on how it is used. The range of legal values for an absolute integer is 0 to 2^32-1 for unsigned treatment and -2^31 to 2^31-1 for signed treatment.
Symbols can be bound as global symbols, local symbols, or weak symbols. The linker handles symbols differently based on their binding. For example, the linker does not allow multiple global definitions of a symbol, but local symbols can be defined in multiple object files (but only once per object file). The linker does not resolve references to local symbols in different object files, but it does resolve references to global symbols in any other object file.
A global symbol is defined in the same manner as any other symbol; that is, it appears as a label or is defined by a directive, such as .set, .equ, .bss, or .usect. If a global symbol is defined more than once, the linker issues a multiple-definition error. (The assembler can provide a similar multiple-definition error for local symbols.)
A weak symbol is a symbol that is used in the current module but is defined in another module. The linker resolves this symbol's definition at link time. Weak symbols are similar to global symbols, except that if one object file contains a weak symbol, and another object file contains a global symbol with the same name, the global symbol is used to resolve references. A weak reference may be unresolved at link time, in which case the address is treated as 0. Therefore, for weak references, application code must test to make sure &var is not zero before attempting to read the contents. See Section 2.6.2 for more about weak symbols.
In general, common symbols (see .common directive) are preferred over weak symbols.
See Section 4.8 for information about assembler symbols.
2.6.1 External Symbols
External symbols are symbols that are visible to other object modules. Because they are visible across object modules, they may be defined in one file and referenced in another file. You can use the .def, .ref, or .global directive to identify a symbol as external:
.def | The symbol is defined in the current file and may be used in another file. |
.ref | The symbol is referenced in the current file, but defined in another file. |
.global | The symbol can be either of the above. The assembler chooses either .def or .ref as appropriate for each symbol. |
The following code fragments illustrate the use of the .global directive.
x: ADD R0, #56h ; Define x
.global x ; acts as .def x
Because x is defined in this module, the assembler treats ".global x" as ".def x". Now other modules can refer to x.
B y ; Reference y
.global y ; .ref of y
Because y is not defined in this module, the assembler treats ".global y" as ".ref y". The symbol y must be defined in another module.
Both the symbols x and y are external symbols and are placed in the object file's symbol table; x as a defined symbol, and y as an undefined symbol. When the object file is linked with other object files, the entry for x will be used to resolve references to x in other files. The entry for y causes the linker to look through the symbol tables of other files for y’s definition.
The linker attempts to match all references with corresponding definitions. If the linker cannot find a symbol's definition, it prints an error message about the unresolved reference. This type of error prevents the linker from creating an executable object module.
An error also occurs if the same symbol is defined more than once.
2.6.2 Weak Symbols
The linker processes absolute symbols that are defined with "weak" binding differently from absolute symbols that are defined with global binding (the default). Instead of including a weak absolute symbol in the output file's symbol table by default (as it would for a global absolute symbol), the linker only includes a weak absolute symbol in the output of a "final" link if the symbol is required to resolve an otherwise unresolved reference.
This weak symbol handling allows you to associate addresses with symbols known to have been pre-loaded (such as function addresses in system memory) and then link the current application against a pre-loaded memory image. If such symbols are defined as weak absolute symbols, the linker can minimize the number of symbols it includes in the output file's symbol table by omitting those that are not needed to resolve references. Reducing the size of the output file's symbol table reduces the time required to link, especially if there are a large number of pre-loaded symbols to link against. This feature is particularly helpful for OpenCL applications.
You can define a weak absolute symbol using either assembly or the linker command file.
Using Assembly: To define a weak absolute symbol in an input object file, the source file can be written in assembly. Use the .weak and .set directives in combination as shown in the following example, which defines a weak absolute symbol "ext_addr_sym":
.weak ext_addr_sym
ext_addr_sym .set 0x12345678
Assemble the source file that defines weak symbols, and include the resulting object file in the link. The "ext_addr_sym" in this example is available as a weak absolute symbol in a final link. It is a candidate for removal if the symbol is not referenced elsewhere in the application. See .weak directive.
Using the Linker Command File: To define a weak symbol in a linker command file, use the "weak" operator in an assignment expression to designate that the symbol as eligible for removal from the output file's symbol table if it is not referenced. In a linker command file, an assignment expression outside a MEMORY or SECTIONS directive can be used to define a weak linker-defined absolute symbol. For example, you can define "ext_addr_sym" as follows:
weak(ext_addr_sym) = 0x12345678;
If the linker command file is used to perform the final link, then "ext_addr_sym" is presented to the linker as a weak absolute symbol; it will not be included in the resulting output file if the symbol is not referenced. See Section 8.6.2.
If there are multiple definitions of the same absolute symbol, the linker uses certain rules to determine which definition takes precedence. Some definitions may have weak binding and others may have strong binding. "Strong" in this context means that the symbol has not been given a weak binding by either of the two methods described above. Some definitions may come from an input object file (that is, using assembly directives) and others may come from an assignment statement in a linker command file. The linker uses the following guidelines to determine which definition is used when resolving references to a symbol:
- A strongly bound symbol always takes precedence over a weakly bound symbol.
- If two symbols are both strongly bound or both weakly bound, a symbol defined in a linker command file takes precedence over a symbol defined in an input object file.
- If two symbols are both strongly bound and both are defined in an input object file, the linker provides a symbol redefinition error and halts the link process.
2.6.3 The Symbol Table
The assembler generates an entry in the symbol table for each .ref, .def, or .global directive in Section 2.6.1). These are external symbols, which are visible to other object modules.
The assembler also creates special symbols that point to the beginning of each section.
The assembler does not usually create symbol table entries for any symbols other than those described above, because the linker does not use them. For example, labels (Section 4.8.2) are not included in the symbol table unless they are declared with the .global directive. For informational purposes, it is sometimes useful to have entries in the symbol table for each symbol in a program. To accomplish this, invoke the assembler with the --output_all_syms option (see Section 4.3).
2.7 Symbolic Relocations
The assembler treats each section as if it began at address 0. Of course, all sections cannot actually begin at address 0 in memory, so the linker must relocate sections. Relocations are symbol-relative rather than section-relative.
The linker can relocate sections by:
- Allocating them into the memory map so that they begin at the appropriate address as defined with the linker's MEMORY directive
- Adjusting symbol values to correspond to the new section addresses
- Adjusting references to relocated symbols to reflect the adjusted symbol values
The linker uses relocation entries to adjust references to symbol values. The assembler creates a relocation entry each time a relocatable symbol is referenced. The linker then uses these entries to patch the references after the symbols are relocated. Example 2-1 contains a code fragment for a ARM device for which the assembler generates relocation entries.
Example 2-1 Code That Generates Relocation Entries
1 *********************************************
2 ** Generating Relocation Entries **
3 *********************************************
4 .ref X
5 .def Y
6 00000000 .text
7 00000000 E0921003 ADDS R1, R2, R3
8 00000004 0A000001 BEQ Y
9 00000008 E1C410BE STRH R1, [R4, #14]
10 0000000c EAFFFFFB! B X ; generates a relocation entry
11 00000010 E0821003 Y: ADD R1, R2, R3
In Example 2-1, both symbols X and Y are relocatable. Y is defined in the .text section of this module; X is defined in another module. When the code is assembled, X has a value of 0 (the assembler assumes all undefined external symbols have values of 0), and Y has a value of 16 (relative to address 0 in the .text section). The assembler generates two relocation entries: one for X and one for Y. The reference to X is an external reference (indicated by the ! character in the listing). The reference to Y is to an internally defined relocatable symbol (indicated by the ' character in the listing).
After the code is linked, suppose that X is relocated to address 0x10014. Suppose also that the .text section is relocated to begin at address 0x10000; Y now has a relocated value of 0x10010. The linker uses the relocation entry for the reference to X to patch the branch instruction in the object code:
EAFFFFFB! B X |
becomes | EA000000 |
2.8 Loading a Program
The linker creates an executable object file which can be loaded in several ways, depending on your execution environment. These methods include using Code Composer Studio or the hex conversion utility. For details, see Section 3.1.
Copyright© 2015, Texas Instruments Incorporated. An IMPORTANT NOTICE for this document addresses availability, warranty, changes, use in safety-critical applications, intellectual property matters and other important disclaimers.