ARM Assembly
Language Tools
v18.1.0.LTS User's Guide
SPNU118U - REVISED JANUARY 2018
3 Program Loading and Running
Even after a program is written, compiled, and linked into an executable object file, there are still many tasks that need to be performed before the program does its job. The program must be loaded onto the target, memory and registers must be initialized, and the program must be set to running.
Some of these tasks need to be built into the program itself. Bootstrapping is the process of a program performing some of its own initialization. Many of the necessary tasks are handled for you by the compiler and linker, but if you need more control over these tasks, it helps to understand how the pieces are expected to fit together.
This chapter will introduce you to the concepts involved in program loading, initialization, and startup.
This chapter does not cover dynamic loading.
This chapter currently provides examples for the C6000 device family. Refer to your device documentation for various device-specific aspects of bootstrapping.
3.1 Loading
A program needs to be placed into the target device's memory before it may be executed. Loading is the process of preparing a program for execution by initializing device memory with the program's code and data. A loader might be another program on the device, an external agent (for example, a debugger), or the device might initialize itself after power-on, which is known as bootstrap loading, or bootloading.
The loader is responsible for constructing the load image in memory before the program starts. The load image is the program's code and data in memory before execution. What exactly constitutes loading depends on the environment, such as whether an operating system is present. This section describes several loading schemes for bare-metal devices. This section is not exhaustive.
A program may be loaded in the following ways:
- A debugger running on a connected host workstation. In a typical embedded development setup, the device is subordinate to a host running a debugger such as Code Composer Studio (CCS). The device is connected with a communication channel such as a JTAG interface. CCS reads the program and writes the load image directly to target memory through the communications interface.
- Another program running on the device. The running program can create the load image and transfer control to the loaded program. If an operating system is present, it may have the ability to load and run programs.
- "Burning" the load image onto an EPROM module. The hex converter (armhex) can assist with this by converting the executable object file into a format suitable for input to an EPROM programmer. The EPROM is placed onto the device itself and becomes a part of the device's memory. See Section 12 for details.
- Bootstrap loading from a dedicated peripheral, such as an I2C peripheral. The device may require a small program called a bootloader to perform the loading from the peripheral. The hex converter can assist in creating a bootloader.
3.1.1 Load and Run Addresses
Consider an embedded device for which the program's load image is burned onto EPROM/ROM. Variable data in the program must be writable, and so must be located in writable memory, typically RAM. However, RAM is volatile, meaning it will lose its contents when the power goes out. If this data must have an initial value, that initial value must be stored somewhere else in the load image, or it would be lost when power is cycled. The initial value must be copied from the non-volatile ROM to its run-time location in RAM before it is used. See Section 8.8 for ways this is done.
The load address is the location of an object in the load image.
The run address is the location of the object as it exists during program execution.
An object is a chunk of memory. It represents a section, segment, function, or data.
The load and run addresses for an object may be the same. This is commonly the case for program code and read-only data, such as the .const section. In this case, the program can read the data directly from the load address. Sections that have no initial value, such as the .bss section, do not have load data and are considered to have load and run addresses that are the same. If you specify different load and run addresses for an uninitialized section, the linker provides a warning and ignores the load address.
The load and run addresses for an object may be different. This is commonly the case for writable data, such as the .data section. The .data section's starting contents are placed in ROM and copied to RAM. This often occurs during program startup, but depending on the needs of the object, it may be deferred to sometime later in the program as described in Section 3.5.
Symbols in assembly code and object files almost always refer to the run address. When you look at an address in the program, you are almost always looking at the run address. The load address is rarely used for anything but initialization.
The load and run addresses for a section are controlled by the linker command file and are recorded in the object file metadata.
The load address determines where a loader places the raw data for the section. Any references to the section (such as references to labels in it) refer to its run address. The application must copy the section from its load address to its run address before the first reference of the symbol is encountered at run time; this does not happen automatically simply because you specify a separate run address. For examples that specify load and run addresses, see Section 8.5.6.1.
For an example that illustrates how to move a block of code at run time, see Example 8-10. To create a symbol that lets you refer to the load-time address, rather than the run-time address, see the .label directive. To use copy tables to copy objects from load-space to run-space at boot time, see Section 8.8.
ELF format executable object files contain segments. See Section 2.3 for information about sections and segments.
3.1.2 Bootstrap Loading
The details of bootstrap loading (bootloading) vary a great deal between devices. Not every device supports every bootloading mode, and using the bootloader is optional. This section discusses various bootloading schemes to help you understand how they work. Refer to your device's data sheet to see which bootloading schemes are available and how to use them.
A typical embedded system uses bootloading to initialize the device. The program code and data may be stored in ROM or FLASH memory. At power-on, an on-chip bootloader (the primary bootloader) built into the device hardware starts automatically.
The primary bootloader is typically very small and copies a limited amount of memory from a dedicated location in ROM to a dedicated location in RAM. (Some bootloaders support copying the program from an I/O peripheral.) After the copy is completed, it transfers control to the program.
For many programs, the primary bootloader is not capable of loading the entire program, so these programs supply a more capable secondary bootloader. The primary bootloader loads the secondary bootloader and transfers control to it. Then, the secondary bootloader loads the rest of the program and transfers control to it. There can be any number of layers of bootloaders, each loading a more capable bootloader to which it transfers control.
3.1.2.1 Boot, Load, and Run Addresses
The boot address of a bootloaded object is where its raw data exists in ROM before power-on.
The boot, load, and run addresses for an object may all be the same; this is commonly the case for .const data. If they are different, the object's contents must be copied to the correct location before the object may be used.
The boot address may be different than the load address. The bootloader is responsible for copying the raw data to the load address.
The boot address is not controlled by the linker command file or recorded in the object file; it is strictly a convention shared by the bootloader and the program.
3.1.2.2 Primary Bootloader
The detailed operation of the primary bootloader is device-specific. Some devices have complex capabilities such as booting from an I/O peripheral or configuring memory controller parameters.
3.1.2.3 Secondary Bootloader
The hex converter assumes the secondary bootloader is of a particular format. The hex converter's model bootloader uses a boot table. You can use whatever format you want, but if you follow this model, the hex converter can create the boot table automatically.
3.1.2.4 Boot Table
The input for the model secondary bootloader is the boot table. The boot table contains records that instruct the secondary bootloader to copy blocks of data contained in the table to specified destination addresses. The hex conversion utility automatically builds the boot table for the secondary bootloader. Using the utility, you specify the sections you want to initialize, the boot table location, and the name of the section containing the secondary bootloader routine and where it should be located. The hex conversion utility builds a complete image of the table and adds it to the program.
The boot table is target-specific. For C6000, the format of the boot table is simple. A header record contains a 4-byte field that indicates where the boot loader should branch after it has completed copying data. After the header, each section that is to be included in the boot table has the following contents:
- 4-byte field containing the size of the section
- 4-byte field containing the destination address for the copy
- the raw data
- 0 to 3 bytes of trailing padding to make the next field aligned to 4 bytes
More than one section can be entered; a termination block containing an all-zero 4-byte field follows the last section.
See Section 12.11.2 for details about the boot table format.
3.1.2.5 Bootloader Routine
The bootloader routine is a normal function, except that it executes before the C environment is set up. For this reason, it can't use the C stack, and it can't call any functions that have yet to be loaded!
The following sample code is for C6000 and is from Creating a Second-Level Bootloader for FLASH Bootloading on TMS320C6000 Platform With Code Composer Studio (SPRA999).
Example 3-1 Sample Secondary Bootloader Routine
; ======== boot_c671x.s62 ========
; global EMIF symbols defined for the c671x family
.include boot_c671x.h62
.sect ".boot_load" .global _boot
_boot:
;************************************************************************
;* DEBUG LOOP − COMMENT OUT B FOR NORMAL OPERATION
;************************************************************************
zero B1
_myloop: ; [!B1] B _myloop
nop 5
_myloopend: nop
;************************************************************************
;* CONFIGURE EMIF
;************************************************************************
;****************************************************************
; *EMIF_GCTL = EMIF_GCTL_V;
;****************************************************************
mvkl EMIF_GCTL,A4
|| mvkl EMIF_GCTL_V,B4
mvkh EMIF_GCTL,A4
|| mvkh EMIF_GCTL_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_CE0 = EMIF_CE0_V
;****************************************************************
mvkl EMIF_CE0,A4
|| mvkl EMIF_CE0_V,B4
mvkh EMIF_CE0,A4
|| mvkh EMIF_CE0_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_CE1 = EMIF_CE1_V (setup for 8−bit async)
;****************************************************************
mvkl EMIF_CE1,A4
|| mvkl EMIF_CE1_V,B4
mvkh EMIF_CE1,A4
|| mvkh EMIF_CE1_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_CE2 = EMIF_CE2_V (setup for 32−bit async)
;****************************************************************
mvkl EMIF_CE2,A4
|| mvkl EMIF_CE2_V,B4
mvkh EMIF_CE2,A4
|| mvkh EMIF_CE2_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_CE3 = EMIF_CE3_V (setup for 32−bit async)
;****************************************************************
|| mvkl EMIF_CE3,A4
|| mvkl EMIF_CE3_V,B4 ;
mvkh EMIF_CE3,A4
|| mvkh EMIF_CE3_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_SDRAMCTL = EMIF_SDRAMCTL_V
;****************************************************************
|| mvkl EMIF_SDRAMCTL,A4
|| mvkl EMIF_SDRAMCTL_V,B4 ;
mvkh EMIF_SDRAMCTL,A4
|| mvkh EMIF_SDRAMCTL_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_SDRAMTIM = EMIF_SDRAMTIM_V
;****************************************************************
|| mvkl EMIF_SDRAMTIM,A4
|| mvkl EMIF_SDRAMTIM_V,B4 ;
mvkh EMIF_SDRAMTIM,A4
|| mvkh EMIF_SDRAMTIM_V,B4
stw B4,*A4
;****************************************************************
; *EMIF_SDRAMEXT = EMIF_SDRAMEXT_V
;****************************************************************
|| mvkl EMIF_SDRAMEXT,A4
|| mvkl EMIF_SDRAMEXT_V,B4 ;
mvkh EMIF_SDRAMEXT,A4
|| mvkh EMIF_SDRAMEXT_V,B4
stw B4,*A4
;****************************************************************************
; copy sections
;****************************************************************************
mvkl COPY_TABLE, a3 ; load table pointer
mvkh COPY_TABLE, a3
ldw *a3++, b1 ; Load entry point
copy_section_top:
ldw *a3++, b0 ; byte count
ldw *a3++, a4 ; ram start address
nop 3
[!b0] b copy_done ; have we copied all sections?
nop 5
copy_loop:
ldb *a3++,b5
sub b0,1,b0 ; decrement counter
[ b0] b copy_loop ; setup branch if not done
[!b0] b copy_section_top
zero a1
[!b0] and 3,a3,a1
stb b5,*a4++
[!b0] and −4,a3,a5 ; round address up to next multiple of 4
[ a1] add 4,a5,a3 ; round address up to next multiple of 4
;****************************************************************************
; jump to entry point
;****************************************************************************
copy_done:
b .S2 b1
nop 5
3.2 Entry Point
The entry point is the address at which the execution of the program begins. This is the address of the startup routine. The startup routine is responsible for initializing and calling the rest of the program. For a C/C++ program, the startup routine is usually named _c_int00 (see Section 3.3.1). After the program is loaded, the value of the entry point is placed in the PC register and the CPU is allowed to run.
The object file has an entry point field. For a C/C++ program, the linker will fill in _c_int00 by default. You can select a custom entry point; see Section 8.4.13. The device itself cannot read the entry point field from the object file, so it has to be encoded in the program somewhere.
- If you are using a bootloader, the boot table includes an entry point field. When it finishes running, the bootloader branches to the entry point.
- If you are using an interrupt vector, the entry point is installed as the RESET interrupt handler. When RESET is applied, the startup routine will be invoked.
- If you are using a hosted debugger, such as CCS, the debugger may explicitly set the program counter (PC) to the value of the entry point.
3.3 Run-Time Initialization
After the load image is in place, the program can run. The subsections that follow describe bootstrap initialization of a C/C++ program. An assembly-only program may not need to perform all of these steps.
3.3.1 _c_int00
The function _c_int00 is the startup routine (also called the boot routine) for C/C++ programs. It performs all the steps necessary for a C/C++ program to initialize itself.
The name _c_int00 means that it is the interrupt handler for interrupt number 0, RESET, and that it sets up the C environment. Its name need not be exactly _c_int00, but the linker sets _c_int00 as the entry point for C programs by default. The compiler's run-time-support library provides a default implementation of _c_int00.
The startup routine is responsible for performing the following actions:
- Switch to user mode and sets up the user mode stack
- Set up status and configuration registers
- Set up the stack and secondary system stack
- Process special binit copy table, if present.
- Process the run-time initialization table to autoinitialize global variables (when using the --rom_model option)
- Call all global constructors
- Call the function main
- Call exit when main returns
3.3.2 RAM Model vs. ROM Model
In the EABI ROM model, the .cinit section is loaded into memory along with other initialized sections. The linker defines a "cinit" symbol that points to the beginning of the initialization tables in memory. When the program begins running, the C boot routine copies data from these tables into the .bss section.
In the EABI RAM model, no .cinit records are generated at startup.
3.3.2.1 Autoinitializing Variables at Run Time (--rom_model)
Autoinitializing variables at run time is the default method of autoinitialization. To use this method, invoke the linker with the --rom_model option.
Using this method, the .cinit section is loaded into memory along with all the other initialized sections. The linker defines a special symbol called cinit that points to the beginning of the initialization tables in memory. When the program begins running, the C boot routine copies data from the tables (pointed to by .cinit) into the specified variables in the .bss section. This allows initialization data to be stored in slow non-volatile memory and copied to fast memory each time the program is reset.
Figure 3-3 illustrates autoinitialization at run time. Use this method in any system where your application runs from code burned into slow memory or needs to survive a reset.
3.3.2.2 Initializing Variables at Load Time (--ram_model)
Initialization of variables at load time enhances performance by reducing boot time and by saving the memory used by the initialization tables. To use this method, invoke the linker with the --ram_model option.
When you use the --ram_model linker option, the linker sets the STYP_COPY bit in the .cinit section's header. This tells the loader not to load the .cinit section into memory. (The .cinit section occupies no space in the memory map.) The linker also sets the cinit symbol to -1 (normally, cinit points to the beginning of the initialization tables). This indicates to the boot routine that the initialization tables are not present in memory; accordingly, no run-time initialization is performed at boot time.
A loader must be able to perform the following tasks to use initialization at load time:
- Detect the presence of the .cinit section in the object file.
- Determine that STYP_COPY is set in the .cinit section header, so that it knows not to copy the .cinit section into memory.
- Understand the format of the initialization tables.
Figure 3-4 illustrates the initialization of variables at load time.
3.3.2.3 The --rom_model and --ram_model Linker Options
The following list outlines what happens when you invoke the linker with the --ram_model or --rom_model option.
- The symbol _c_int00 is defined as the program entry point. The _c_int00 symbol is the start of the C boot routine in boot.obj; referencing _c_int00 ensures that boot.obj is automatically linked in from the appropriate run-time-support library.
- The .cinit output section is padded with a termination record to tell the boot routine (autoinitialize at run time) or the loader (initialize at load time) when to stop reading initialization tables.
- When you initialize at load time (--ram_model option):
- The linker sets cinit to -1. This indicates that the initialization tables are not in memory, so no initialization is performed at run time.
- The STYP_COPY flag (0010h) is set in the .cinit section header. STYP_COPY is the special attribute that tells the loader to perform initialization directly and not to load the .cinit section into memory. The linker does not allocate space in memory for the .cinit section.
- When you autoinitialize at run time (--rom_model option), the linker defines cinit as the starting address of the .cinit section. The C boot routine uses this symbol as the starting point for autoinitialization.
3.3.3 Copy Tables
The RTS function copy_in can be used at run-time to move code and data around, usually from its load address to its run address. This function reads size and location information from copy tables. The linker automatically generates several kinds of copy tables. Refer to Section 8.8.
You can create and control code overlays with copy tables. See Section 8.8.4 for details and examples.
Using copy tables is similar to performing run-time relocations as described in Section 3.5, however copy tables require a specific table format.
3.3.3.1 BINIT
The BINIT (boot-time initialization) copy table is special in that the target will automatically perform the copying at auto-initialization time. Refer to Section 8.8.4.2 for more about the BINIT copy table name. The BINIT copy table is copied before .cinit processing.
3.3.3.2 CINIT
EABI .cinit tables are special kinds of copy tables. Refer to Section 3.3.2.1 for more about using the .cinit section with the ROM model and Section 3.3.2.2 for more using it with the RAM model.
3.4 Arguments to main
Some programs expect arguments to main (argc, argv) to be valid. Normally this isn't possible for an embedded program, but the TI runtime does provide a way to do it. The user must allocate an .args section of an appropriate size using the --args linker option. It is the responsibility of the loader to populate the .args section. It is not specified how the loader determines which arguments to pass to the target. The format of the arguments is the same as an array of pointers to char on the target.
See Section 8.4.4 for information about allocating memory for argument passing.
3.5 Run-Time Relocation
At times you may want to load code into one area of memory and move it to another area before running it. For example, you may have performance-critical code in an external-memory-based system. The code must be loaded into external memory, but it would run faster in internal memory. Because internal memory is limited, you might swap in different speed-critical functions at different times.
The linker provides a way to handle this. Using the SECTIONS directive, you can optionally direct the linker to allocate a section twice: first to set its load address and again to set its run address. Use the load keyword for the load address and the run keyword for the run address. See Section 3.1.1 for more about load and run addresses. If a section is assigned two addresses at link time, all labels defined in the section are relocated to refer to the run-time address so that references to the section (such as branches) are correct when the code runs.
If you provide only one allocation (either load or run) for a section, the section is allocated only once and loads and runs at the same address. If you provide both allocations, the section is actually allocated as if it were two separate sections of the same size.
Uninitialized sections (such as .bss) are not loaded, so the only significant address is the run address. The linker allocates uninitialized sections only once; if you specify both run and load addresses, the linker warns you and ignores the load address.
For a complete description of run-time relocation, see Section 8.5.6.
3.6 Additional Information
See the following sections and documents for additional information:
Section 8.4.4 , "Allocate Memory for Use by the Loader to Pass Arguments (--arg_size Option)"
Section 8.4.13 , "Define an Entry Point (--entry_point Option)"
Section 8.5.6.1 ,"Specifying Load and Run Addresses"
Section 8.8 , "Linker-Generated Copy Tables"
Section 8.11.1 , "Run-Time Initialization"
Section 12 , "Hex Conversion Utility Description"
"Run-Time Initialization," "Initialization by the Interrupt Vector," and "System Initialization" sections in the ARM Optimizing C/C++ Compiler User's Guide
Creating a Second-Level Bootloader for FLASH Bootloading on TMS320C6000 Platform With Code Composer Studio (SPRA999).
Copyright© 2018, Texas Instruments Incorporated. An IMPORTANT NOTICE for this document addresses availability, warranty, changes, use in safety-critical applications, intellectual property matters and other important disclaimers.