Design

Booting an Intel Architecture System, Part I: Early Initialization

The boot sequence today is far more complex than it was even a decade ago. Here's a detailed, low-level, step-by-step walkthrough of the boot up.

Memory Configuration and Initialization

The initialization of the memory controller varies slightly depending on the DRAM technology and the capabilities of the memory controller itself. The information on the DRAM controller is proprietary for SOC devices, and in such cases the initialization Memory Reference Code (MRC) is typically supplied by the vendor. Developers have to contact Intel to request access to the low level information required. Developers who lack an MRC can utilize a standard JEDEC initialization sequence for a given DRAM technology. JEDEC refers to the JEDEC Solid State Technology Association, which was formerly known as the Joint Electronic Devices Engineering Council. JEDEC is a trade organization and standards body for the electronics industry.

It is likely that memory configuration will be performed by single-point-of-entry and single-point-of-exit code that has multiple boot paths contained within it. It will be 32-bit protected-mode code. Settings for buffer strengths and loading for a given number of banks of memory are chipset specific.

There is a very wide range of DRAM configuration parameters, including number of ranks, eight-bit or 16-bit addresses, overall memory size and constellation, soldered down or add-in module configurations, page-closing policy, and power management. Given that most embedded systems populate soldered-down DRAM on the board, the firmware may not need to discover the configuration at boot time. These configurations are known as memory-down. The firmware is specifically built for the target configuration. This is the case for the Intel reference platform from the Embedded Computing Group. At current DRAM speeds, the wires between the memory controllers behave like transmission lines; the SOC may provide automatic calibration and run-time control of resistive compensation and delay-locked look capabilities. These capabilities allow the memory controller to change elements such as drive strength to ensure error-free operation over time and temperature variations.

If the platform supports add-in-modules for memory, it may use any of a number of standard form factors. The small-outline Dual In-Line Memory Module (DIMM) is often found in embedded systems. The DIMMs provide a serial EPROM that contains DRAM configuration information known as Serial Presence Detect (SPD) data. The firmware reads the SDP data and subsequently configures the device. The serial EPROM is connected via the System Management Bus (SMBus). This means the device must be available in the early initialization phase so the software can establish the memory devices on-board. It is also possible for memory-down motherboards to incorporate SPD EEPROMs to allow for multiple and updatable memory configurations that can be handled efficiently by a single BIOS algorithm. A hard-coded table in one of the MRC files could be used to implement an EEPROM-less design.

Once the memory controller has been initialized, a number of subsequent cleanup events take place, including tests to ensure that memory is operational. Memory testing is now part of the MRC, but it is possible to add more tests should the design require it. BIOS vendors typically provide some kind of memory test on a cold boot. Writing custom firmware requires the authors to choose a balance between thoroughness and speed, as many embedded and mobile devices require extremely fast boot times. Memory testing can take considerable time.

If testing is warranted, right after initialization is the time to do it. The system is idle, the subsystems are not actively accessing memory, and the OS has not taken over the host side of the platform. Several hardware features can assist in this testing both during boot and at run-time. These features have traditionally been thought of as high-end or server features, but over time they have moved into the client and embedded markets.

One of the most common technologies is error-correction codes. Some embedded devices use ECC memory, which may require extra initialization. After power-up, the state of the correction codes may not reflect the contents, and all memory must be written to. Writing to memory ensures that the ECC bits are valid and set to the appropriate contents. For security purposes, the memory may need to be zeroed out manually by the BIOS  or, in some cases, a memory controller may incorporate the feature into hardware to save time.

Depending on the source of the reset and security requirements, the system may or may not execute a memory wipe or ECC initialization. On a warm reset sequence, memory context can be maintained.

If there are memory timing changes or other configuration alterations that require a reset to take effect, this is normally the time to execute a warm reset. That warm reset would start the early initialization phase over again. Affected registers would need to be restored.

From the reset vector, execution starts directly from nonvolatile flash storage. This operating mode is known as execute-in-place. The read performance of nonvolatile storage is much slower than the read performance of DRAM. The performance of code running from flash is therefore much lower than code executed in RAM. Most firmware is therefore copied from slower nonvolatile storage into RAM. The firmware is then executed in RAM in a process known as shadowing.

In embedded systems, the chip typically selects ranges that are managed to allow the change from flash to RAM execution. Most computing systems execute-in-place as little as possible. However, some RAM-constrained embedded platforms execute all applications in place. This is a viable option on very small embedded devices.

Intel Architecture systems generally do not execute-in-place for anything but the initial boot steps before memory has been configured. The firmware is often compressed, allowing reduction of nonvolatile RAM requirements. Clearly, the processor cannot execute a compressed image in place. There is a trade-off between the size of data to be shadowed and the act of decompression. The decompression algorithm may take much longer to load and execute than it would take for the image to remain uncompressed. Prefetchers in the processor, if enabled, may speed up execution-in-place. Some SOCs have internal NVRAM cache buffers to assist in pipelining the data from the flash to the processor. Figure 3 shows the memory map at initialization in real mode.

Figure 3: The Intel Architecture memory map at power-on.

Before memory is initialized, the data and code stacks are held in the processor cache. Once memory is initialized, the system must exit that special caching mode and flush the cache. The stack will be transferred to a new location in main memory and cache reconfigured as part of AP initialization.

The stack must be set up before jumping into the shadowed portion of the BIOS that is now in memory. A memory location must be chosen for stack space. The stack will count down so the top of the stack must be entered and enough memory must be allocated for the maximum stack.

If the system is in real mode, then SS:SP must be set with the appropriate values. If protected mode is used, which is likely the case following MRC execution, then SS:ESP must be set to the correct memory location.

This is where the code makes the jump into memory. If a memory test has not been performed before this point, the jump could very well be to garbage. System failures indicated by a Power-On Self Test (POST) code between "end of memory initialization" and the first following POST code almost always indicate a catastrophic memory initialization problem. If this is a new design, then chances are this is in the hardware and requires step-by-step debug.

For legacy option ROMs and BIOS memory ranges, Intel chipsets usually come with memory aliasing capabilities that allow access to memory below 1 MB to be routed to or from DRAM or nonvolatile storage located just under 4 GB. The registers that control this aliasing are typically referred to as Programmable Attribute Maps (PAMs). Manipulation of these registers may be required before, during, and after firmware shadowing. The control over the redirection of memory access varies from chipset to chipset For example, some chipsets allow control over reads and writes, while others allow control over reads only.

For shadowing, if PAM registers remain at default values (all 0s), all Firmware Hub (FWH) accesses to the E and F segments (E_0000–F_FFFFh) will be directed downstream toward the flash component. This will function to boot the system, but is very slow. Shadowing can be used to improve boot speed. One method of shadowing the E and F segments of BIOS is to utilize the PAM registers. This can be done by changing the enables (HIENABLE[] and LOENABLE[]) to 10 (write only). This will direct reads to the flash device and writes to memory. Data can then be shadowed into memory by reading and writing the same address. Once BIOS code has been shadowed into memory, the enables can be changed to read-only mode so memory reads are directed to memory. This also prevents accidental overwriting of the image in memory.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!