30 Atom Processor Fundamentals

Dr Selvi Ravindran

 

About the module:

 

Continuing the path of exploring advanced embedded processor, the next in lane is the Atom Processor from Intel.The advent of many low power computing devices put forth a strong necessity of an embedded processorwhich thrived on minimal power. Given the market share of such personal mobile device many plunged into the invention of embedded processors which utilized minimal power resources. Intel then stepped in to the embedded world with the introduction of the Atom processor. In this module, the basics about Atom processors, their power saving criteriaand their legacy are discussed.

 

Learning Outcomes:

  • Ability to understand conceptual issues of Atom design issues.
  • Should be able to describe the different modes and register set in Atom. Identify the correct series of Atom for a given application

 

1.1 Atom Overview

 

Atom was introduced with the aim of being Intel’s smallest and low power processor to cater to the present generation handy gadget (Figure 1)which requires long battery life. It generally works at a frequency range of 1.6 GHz to 2.5 GHz except the Z series model works in Mhz. Thermal Design Power(TDP)is themaximum amount of power the cooling system in a computer is required to dissipate. Given the different advanced power saving configuration the AtomTDP is around 2.5 W.Table 1.1 shows the comparison of different processor in the Intel family.The Intel® Atom™ processor is dedicated for: 1.Pocket-sized and low power Mobile Internet Devices (MIDs). 2.Internet-focused notebooks (netbooks) 3. Desktops (nettops).M

 

1.2 Breakaways from the Intel x86 architecture

 

The Atom processor had to keep the architecture design simpler than its predecessor. It had to strip off many performance efficient features to save the energy and keep the device running for a longer time. These breakaways were the ones responsible in bringing the Atom TDP to around 2 Watts compared to the 130 Watts i7 processor. The mantra is simple design with added capabilities. The Usage of power should be equivalent to performance gain. Hencethe idea is to say no for any power hungry technology. Following are a few features which are not inherited from the x86 micro-architecture

 

1.    Out of order execution

 

2.    Aggressive speculation

 

 

1.2.1 Out of order execution

 

Traditionally a program gets executed in order, that is first instruction ‘i’executes and then the instruction ‘i+1’ is executed. Even in a pipelined processor the instructions are executed in order. This methodology faces stalls due to hazards occurring in program execution. Hence, the Intel architecture incorporates a hardware which detects hazards, does a run-time scheduling which may execute later instructions first to minimize stalls, and improves the Cycles Per Instruction(CPI). This out of order executionuses power hungry hardware which is eliminated in Atom.

 

1.2.2 Aggressive Speculation

 

Branch(control) instructionscan affect the PC value(address) and hence cause the stalls and performance degradation. Branch prediction is used to reduce the pipeline branch penalties. In case of dynamic scheduling an aggressive speculation is done with the help of reorder buffer to minimize control hazards. The speculation could use a static or dynamic branch prediction circuit which needs logic to shuffle instructions. In Atom, both the speculation and out of order execution are discardedto reduce the power consumption. The loss in performance is compensated by using the dual issue hyper threading technology.

 

1.3 Atom (Bonnell) core

 

The Atom processor is an in-order, two-instruction wide superscalar processor. It implements IA-32(Intel386 CISC) Instruction Set Architecture(ISA) extended withMulti Media Extension (MMX), Streaming SIMD Extension (SSE), SSE2 and SSE3.Table 1.2 shows the SIMD instruction set details for Intel. The microarchitecture refers to the Bonnell core. Figure 1.2 shows the Atom processor die which consists of Front Side Bus(FSB), Bus Interface Unit (BIU), Front End Cluster (FEC), Memory Execution Unit (MEC), Floating Point Execution Cluster (FPEC) and Integer Execution unit (IEC).The processor employs two level caches L1 and L2. The first level LI cachehas 32 KB Instruction cache & 24KB Data cache. The second level L2 cache has 512 KB Instruction cache & Data cache.Both use data prefetcher, a hardware which analyses the memory access patterns and prefetches the future references well in advance

Figure 1.2 Atom processor Die

 

 

1.3.1 Register Set

 

Atom predominantly uses IA32with CISC instruction set of variable length. It employs eight 32bit integer registers, eight 80bit Floating point registers and eight 128bit registers for Single Instruction Multiple Data stream (SIMD). The FEC enables SIMD computation on multiple streams of data with the help of XMM register. Figure 1.3 shows the IA-32 register set architecture. A few versions of Atom implements IA-64 which has sixteen 64bit integer registersand sixteen 128bit registers for SIMD.

 

 

1.4 Processor Modes

 

The processor mode defines the working environment of the system. In general the Atom processor has three modes: 1. Real mode. 2. Virtual mode. 3. Protected mode. The Real mode is the entry point for the system software. It uses anunprotected 20bit address space along with 16bit register set.All Intel processors come to real mode on reset. The Virtualx86 mode is a legacy mode that is seldom used. The protected mode has a protected memory access for every process. An application cannot interfere with memory used by another application. It demarks the kernel and user level activities. The advanced Multitasking OS runs in this mode.

 

1.5 Advanced Configuration and Power Interface (ACPI)

 

The processor power states are managed by the operating system. ACPI is the industry standard for Power, thermal and battery management.By docking and undocking components the processor can define different processor states for reducing power consumption. Table 1.3 shows the different Atom powerstates. The G’s and D’s refer to the software and hardware options of reducing the power consumptions. The C-states define how much processor is utilized.For eg.,C0 is for processing instructions and Cn is deepest idle or sleep state. Atom has 6 sleep states C1 through C6. Sleep states mainly shuts down the cache and sometimes even the core clock. However, there is time penalty to wake up from the sleep states. Computation can only take place in C0 state.C6 uses 1.6% of TDP. The processor can switch from C6 to C0 in less than 100 microseconds. While entering C6 mode, the processor saves all state information, stops the clocks, shuts down the FSB, and goes to sleep. Coming out of the sleep states, it restart clocks, restores state information, and the pipeline. Atom gradually refills caches on demand to conserve power. Figure 1.4 shows the pictorial representation of idle power, cache usage, core voltage, frequency of active period and working status of PLL and core clock. As seen in the figure the idle wasted is reduced as we move from C0 to C6 state. The sleep duration grows from C0 state and C6.

 

Frequency Scaling can also be incorporated to regulate the power consumption. Table 1.4 depicts the different frequencies for Atom N450. The OS takes cares of switching between the required P states based on demand and utilization. In general the processor can operate on 8 different frequencies. Reducing frequency reduces power consumption. Although itmay increase execution time and hence energy consumption. But, it willcontrol the thermal power produced in the system.

 

1.6 Processor Series

 

The Atom processors are grouped into different series based on the features and target applications. Four series of Atom processors releasedare N, Z, D and E. N series and Z series are both single core. N series targets the netbook application whereas the Z is used in mobile phones and tablets. D series are dual core processors and are used in nettop devices and low power desktops. It operates at higher TDP. The TDP can vary from as low as 0.65 W to 13 W. The N and D series do not support 64bit ISA. The E series is a system on chip design intended for embedded applications.The E series aims at providing high performance at low power with rich user interfaces.

 

1.7 Atom Design Issues

 

The architectural design specifications taken for Atom using the insight of X86 architecture are:In-order execution, dual-issue processor and a 16-stage instruction pipeline with a second integer pipe. Atom design is based on a modular micro-architecture with separate blocks for each micro-operation. Given that it is a dual issue processor, it has two instruction decoders. Like its predecessor, Atom follows the CISC architecture and has variable length opcode. Finding variable-length instruction boundaries takes up to 3 cycles.Instruction cache marks ends of instructions.Hits in the I-cache can skip these extra cycles. On a hit the instruction is fetched to the processor. The parallelization of computation to perform more than one task is supported by hyper threading. Lastly, the support for 64bit x86 ISA and virtualization extension are part of features incorporated among different series of Atom.

 

1.8 Atom Architectural blocks

 

The important features implemented in the architecture of Atom processors features are:

 

1.    In-order execution: Execute instructions sequentially as in the program.

 

2.    Superscalar:Execute more than one instruction. Atom has atwo-wide instruction decoder which gives the ability to execute and retire two instructions in the same clock cycle.

 

3.    Pipelined execution: Overlap and executeMultiple instructions in different stages at the same time.

 

 

1.8.1     In-order Execution

 

Atom uses sequential instruction fetch and execution. The fetch phase is a straight forward task of retrieving the opcode. It faces stalls only if there is a control hazard. The execution phase requires the data for computation. If the input operands are available (in registers for instance), the instruction is dispatched to the appropriate functional unit. If one or more operand is unavailable during the current clock cycle, (generally because they are being fetched from memory or due to data hazard), the processor stalls until they are available. After the instruction execution, the appropriate functional unit writes the results back to the register file. Consider an in-order execution of the code given in Figure1.5. Here stalls are introduced between the move instruction and multiply instruction due to data hazard. Theoperand needed for multiplication is read from memory by the move instruction. Hence, the multiply instruction cannot proceed until the data transfer is completed by the move instruction.

 

Advantages of in order execution are:

 

1.    It eliminates Instruction Reordering Logic.

2.    Reduces Power Consumption.

3.    Reduces Die Space.

 

The disadvantages are:

 

1.    Lower Performance as stalls are introduced due to hazards those clock cycles are not utilized for another instruction execution.

2.    Data dependencies are more critical.

3.    Memory Accesses and slow floating point operations stall the pipeline for longer time.

4.    Inefficiency in CPU hardware usage

 

1.8.2     Dual Issue

 

Atom is a superscalar, dual-issue processor. Although multiple instructionsare processed at the same time by the processor, superscalar execution is different from instruction pipelining. It uses multiple redundant hardware components in the processor at the same time to dispatch multiple instructions. Thus the number of instructions executed per unit time is increased. Figure 1.6 (a) shows the pipeline execution of a sequence of instruction across different time period. Figure 1.6(b) shows the two issue superscalar processor execution.

1.8.3    Hyper-threading technology

 

Multithreading is an ability of the processor to execute multiple processes (threads). Multithreading implemented in a superscalar processor, where multiple instructions are simultaneously issued from different threads in every CPU cycle is called as Simultaneous MultiThreading (SMT). SMT exploits parallelism across different threads as instruction level parallelism within a thread is limited. Hyper-Threading (HT) technology is a form of simultaneous multithreading implemented by Intel on X86 micro-processor.The HT technology when enabled by the OS will view a physical core as two logical cores and schedule the instruction among them. HT can be utilized with the help of an OS specifically optimized for it. From the HT enabled platform point-of-view, two logical cores are available. Figure 1.7 shows the purview of such CPU. Instructions from thread 1 and 2 labelled as A and Bare scheduled and executed on the single core. The processor stages are shared. HT is enabled or disabled as a BIOS option. If HT is enabled the task manager will show twice the number of CPUs. Two threads can execute simultaneously. HT reduces power consumption by 20%. Performance improves by about 40%

Figure 1.7 Pipeline & Instruction Scheduling with HT

 

1.8.4    Atom Pipeline

 

The Atom pipeline has six phases of instruction processing: Instruction Fetch, Instruction Decode, Instruction Issue, Data Access, Execute and Write Back. Phases are further divides into stages. The integer pipeline has 16 stages. The floating point pipeline has 19 stages. Table 1.4 shows the number of stages in each pipeline. Instruction fetch phase has 3 stages for both integer and floating-point pipeline. Figure 1.8 shows the stages of the integer pipeline where the instruction issue phase is further divided into dispatch and operand read phases.

 

1.9 Processor Comparison

 

Lastly we conclude this module by presenting a comparison chart between the Atom Vs PowerPC and ARM processor. Table 1.5 shows a brief comparison of features based on their instructions, register and operations.

  1. Summary

 

In this module, we explored the design issues involved in developing a low power Atomprocessor. We also discussed different Atom series with their field of application. We introduced power modes and ACPI for optimal performance and efficient power usage.We then explained Pipeline and Multi-threading. Lastly we ended the discussion by comparing Atom with ARM and PowerPC Processors.

 

 

4.References

 

  1. Das, Lyla B. The X86 Microprocessors 8086 to Pentium, Multicores, Atom and the 8051 Microcontroller Architecture programming and Interfacing. 2edition.2014 Pearson Education India.
  2. en.wikipedia.org/wiki/Intel_Atom
  3. en.wikipedia.org/wiki/Hyper-threading
  4. Break Away with Intel Atom Processors: A Guide to Architecture Migration by Lori M. Matassa and Max Domeika. 2012. Intel Press.
  5. http://edc.intel.com