27 ARM instruction Set

Dr Selvi Ravindran

About the module:

 

Thismodule aims at exploring ARM instruction set to design an embedded application. The core part of any program will always consist of arithmetic and logical operations. In this module,we start with Instruction set classification, diving into details of data processing instructions and the representation of data.

 

Learning Outcomes:

 

  • Able to understand the nuances of ARM programming.
  • Should be able to write an ARM program incorporating the arithmetic and logicalinstruction.

 

1.1 ARM ISA Features

 

 

ARM is a RISC based processor with LoadStore architecture. Load and store instructions are the memory access instructions. The load instruction reads data from memory address space and store instruction writes into it.The ALU operations are performed on the values that are stored in the register file. Except for load and store, all the other categories of instruction access data from the registers. Few features of ARM ISA are listed below.

 

Fixed-length 32-bit instructions. 3-address instruction formats.

 

Load and store multiple registers.

 

Shift and ALU operationsare performed in a single instruction.

 

Open instruction set extension through the coprocessor instruction. Very dense 16-bit compressed instruction set (Thumb).

 

 

1.1.1     Coprocessor

 

One of the additional feature of ARM is the coprocessor extension.

 

ARM can be interfaced with another processor which expands the functionality of the ARM (Figure 1.1).Up to sixteen coprocessors can be defined and interfaced with ARM. The coprocessor interface expands the ARM instruction set. In the decode stage if the processor identifies the coprocessor instruction, it hands it over to the relevant coprocessor; otherwise it raises an undefined instruction exception. ARM has instructions to access registers in the coprocessor. Each coprocessor can have up to sixteen private registers of any reasonable size.

 

 

1.2 ARM INSTRUCTION SET

 

 

Instruction set is the neural schema of any processor which enables any embedded programmer to understand and utilize a processor. In general, the ARM processor classifies its instruction set into two categories: 1. ARM instruction set and 2. Thumb instruction set. The ARM instructions are 32-bits(word) wide and word-aligned.Thumb instructions are 16 bits(half-word) or 32-bits (word) wide.It is aligned on 2 byte(8bits = 1 byte)or 4 byte(word) boundaries. The Thumb is the subset of ARM and most of its operation maps to the ARM instruction set. The thumb set was introduced to reduce the code size. The processor can execute either the Arm instruction or the Thumb instruction by switching the T bit in the status register CPSR. If the T-bit is set, the processor is in Thumb state and executes Thumb instructions. If the T-bit is reset, the processor is said be in ARM state. Most of the privileged resources are accessed by ARM instructions. The processor executes exception handlers always in the ARM state.We start with exploring the ARM instruction set.Figure 1.2 shows the format of ARM instruction. The instruction is of three address format. An example mnemonic ADD d1,s1,s2 performs addition on source operand s1 and s2 and stores the result in destination d1. The 32 bits are divided among the function(operation) to be performed and the addresses of three operands. The number of bits allocated for addressing operands depends on the addressing mode. For example, if the operands use register addressing mode, we need four bits to address individual operands.

1.3 ARM DATA TYPES

 

 

To understand the ARM instruction set, we list the different data sizes the ARM processor supports. Based on the sizes there are six datatypes. Largely speaking, the data is processed in word, half-word and bytes. By default the instruction processes the operand with a word size. The instructions are appended with alphabet ‘b’ and ‘h’ to indicate the data size of byte and half word. Each data size can have signed or an unsigned representation. Following are the categories of data types used in the ARM ISA:

 

1.    8-bits signed and unsigned bytes

 

2.    16-bits signed and unsigned half-word, aligned on 2-byte boundaries

 

3.    32-bits signed and unsigned words, aligned on 4-byte boundaries

 

 

 

1.4 ARM INSTRUCTION ENCODING

 

ARM instruction has a fixed length coding of 32 bits. Both the operation and the operands with its addressing mode are encoded within 32 bits. Figure 1.3 shows the encoding format for the different categories of instruction in ARM. The 4 MSB bitsindicate various conditions needed for the instruction execution. By default the instruction executes as always keeping the 4 bits as zeros. 8 bits are assigned for the operation representation and 4 bits each for the three operands.

   

 

Conditional Execution

 

Most of the processor executes branch instruction conditionally. The status flags are verified to decide the flow of execution. However by reusing the condition evaluation hardware to all instruction, ARM effectively increases the performance and code density. Every instruction contains a condition field which determines whether the CPU will execute it. The Non-executed instruction will still take up 1 cycle and allow other stages in the pipeline to complete. The conditional execution reduces the number of branches which would stall the pipeline. The dense in-line code reduces the number of pipeline flushes. The time penalty of not executing several conditional instructions is frequently less than the overhead of the branch or instruction call that would otherwise be needed to be redone.

 

Condition mnemonics

 

Conditional execution depends on two components: the condition field and the condition flags. As explained earlier the field is in the instruction. The condition flags are located in the current program status register (cpsr). The different conditions and the corresponding status flags used for verification are shown in Table 1.1.

 

Table 1.1Instruction condition and flags

Suffix Description Flags tested
EQ Equal Z=1
NE Not equal Z=0
CS/HS Unsigned higher or same C=1
CC/LO Unsigned lower C=0
MI Minus N=1
PL Positive or Zero N=0
VS Overflow V=1
VC No overflow V=0
HI Unsigned higher C=1 & Z=0
LS Unsigned lower or same C=0 or Z=1
GE Greater or equal N=V
LT Less than N!=V
GT Greater than Z=0 & N=V
LE Less than or equal Z=1 or N=!V
AL Always

 

Using and Updating the Condition

 

To execute an instruction conditionally, we simply postfix it with the appropriate condition. Consider the example of add instruction which verifies the zero flag. For example, an add instruction takes the form

 

ADD r0, r1, r2     ; r0 = r1 + r2 (ADDAL)

 

To execute this only if the zero flag is set, we use

 

ADDEQ r0, r1, r2               ; r0 = r1 + r2 if zero flag set

 

The flags are the reflection of the previous execution. By default, data processing operations do not affect the condition flags. The only instruction that updates the condition flags is ‘compare’. To cause the condition flags to be updated, the S bit of the instruction needs to be set by postfixing the instruction (and any condition codes) with an “S”.

 

For example, to add two numbers and set the condition flags:

 

ADDS r0, r1, r2 ; r0 = r1 + r2 and set flags

 

The conditional execution helps to reduce the number of instructions needed to write a program for a specific algorithm.

 

1.5 ARM INSTRUCTION TYPE

 

The ARM instruction is a three operand format with two source registers(Rn&Rm ) and one destination(Rd) register. The instruction syntax is “operation<Rd , Rn , Rm>”.The instructionsetis classified into the five categories listed below:

    1.       Data processing instructions

 

2.       Control flow instructions Branch instruction

 

3.       Data transfer instructions

 

Load-store instruction

 

4.       Software Interrupt instruction

 

5.       Program status register instruction

 

 

1.5.1     Data Processing Instructions

 

Data processing instructions are the ones that process, manipulate and compute the data available in registers. These instructions have three operands, except for the move instruction which has two operands. One of the source operand will be inregister; the second operand may be a register/ shifted register or an immediate value. All operands are 32-bit wide; and they come from registers or are specified as literal in the instruction itself. One of its input operands is processed using barrel shifter and then sent to the ALU.The 32-bit result is placed in a register.

 

A long multiply instruction produces a 64-bit result. Any register from the user set can be used as operand.The Program Counter (PC) may be used as a source operand (address of the instruction plus 8) except when a register shift amount is performed. In case of branch instruction PC may be specified as the destination register.

 

The status flags in cpsr are updated whenever the instruction are suffixed with ‘S’. It is the 20th bit in the 32 bits opcode of an instruction as shown in Figure 1.4. If S=0 the condition codes are unaffected. The suffix of S in the instruction turns S=1 and allows direct control of the condition codes with respect to the type of instruction executed. Following are the flags with the values they reflect based on the result of the instruction executed.

 

–   N = 1 if the result is negative; 0 otherwise (i.e. N = bit 31 of the result)

 

–   Z = 1 if the result is zero; 0 otherwise

 

– C = 1 carry out from the ALU when ADD, ADC, SUB, SBC, RSB, RSC, CMP, or CMN; carry out from the shifter

 

– V = 1 if overflow from bit 30 to bit 31; 0 if no overflow(V is preserved in non-arithmetic operations)

 

The data processing instructions are grouped based on the operation it executes. Following are the five sets:

 

1.       Arithmetic Instructions (ADD, ADC, SUB, SBC, RSB)

 

2.       Logical Instructions(BIC, AND, ORR, EOR)

 

3.       Compare Instructions (CMP, TST,CMN, TEQ)

 

4.       Register movement Instructions (MOV, MVN)

 

5.       Multiply Instructions (MLA, MUL)

 

 

 

1.5.2    Arithmetic Instructions

 

The addition and subtraction operation on 32-bit signed and unsigned values are grouped under arithmetic instructions. The syntax of the instruction is:

–   <Operation>{<cond>}{S} Rd, Rn, Operand2

 

The Table 1.2 shows the six arithmetic operations with their mnemonic and explanation.

 

Table 1.2Arithmetic Instructions

 

The logical instructions perform bitwise logical operation on the two source registers. The syntax of logical instruction is: <Operation>{<cond>}{S} Rd, Rn, Operand2

 

The Table 1.3 shows the four logical operations with their mnemonic and explanation.

 

Table 1.3Logical Instructions

Mnemonic Operation Comment
AND r0,r1,r2 r0:=r1&r2 And two 32-bit values
ORR r0,r1,r2 r0:=r1|r2 OR of two 32-bit values
EOR r0,r1,r2 r0:=r1^r2 Exclusive OR of two 32-bit values
BIC r0,r1,r2 r0:=r1& ~r2 Logical bit clear AND NOT

 

 

The AND, OR and Ex-OR are trivial logical operations. Let us consider the execution of BIC instruction. Assume, pre-execution nibble values for simplicity. Let r1=0b1111 and r2= 0b1010. Post execution of the instruction BIC r0, r1,r2;

 

r0= 0b0101. The implication of the result is that every binary one in r2 register clears a corresponding bit location in r0 register.

 

1.5.4     Register Movement Operations

 

Move instruction copies source content into the destination location. It omits the 1st source operand from the format. This instruction is used to initialize data into the special function register and copy values between registers. The syntax of the instruction is <Operation>{<cond>}{S} Rd, Operand2

 

The Table 1.4 shows the two transfer operations with their mnemonic and explanation.

 

Table 1.4Move Instructions

Mnemonic Operation Comment
MOV r0,r2 r0:=r2 move 32-bit value into a register
MVN r0,r2 r0:=~r2 move the complement of 32-bit
value into a register

 

 

Examples :

 

–   MOVS r2, r3// transfer data from r3 to r2

 

–     MVNEQ r1, r2  //negates the r2 value and then transfers it to r1

 

1.5.5     Comparison Operations

 

Compare instruction is used to compare or test a 32 bit value.The execution of these instructions affects the status flags without affecting the operands. It just sets the Condition Code(CC) bits (N, Z, C and V) in CPSR. The outcome of this instruction is used to change the flow of a program execution. These instructions affect the flag without the suffix of the S bit. Since it does not produce any result; the destination filed is omitted from the format. The syntax of the instruction is

<Operation>{<cond>} Rn, Operand2

 

The Table 1.5 shows the four comparison operations with their mnemonic and explanation.

 

Table 1.5Comparison Instructions

Mnemonic Operation Comment
CMPr1,r2 r1-r2 Compare two 32-bit values
CMNr1,r2 r1-(-r2) Compare negated
TSTr1,r2 r0:=r1&r2 Test bits of a 32-bit values
TEQr1,r2 r0:=r1^r2 Test for equality of two 32bits

 

1.5.6    Immediate operands

 

Immediate addressing mode puts the raw 32 bits as one of the operands. It replaces the second source operand with an immediate value, which is a literal constant, preceded by “#”. Consider the two examples containing immediate operands.

ADD r3,r3,#1 ;r3:=r3+1

AND r8,r7,#&FF ;r8:=r7[7:0], &:hexadecimal

 

Since the immediate value is coded within the 32 bits of the instruction, it is not possible to enter every possible 32-bit value as an immediate. The bit layout Figure 1.4 shows only 12 bits are allowed for immediate value. Keeping 12 bits, the maximum value could be 4096. To accommodate larger value 8 bits are allocated to describe the base value and 4 bits are used for the rotate. Different ranges of data can be written by varying the n(4 bits). Following are few examples of varying n which leads to different immediate values. This can be even achieved by rotate right operation with the help of the barrel shifter. The following are a few examples of the values of immediate data obtained for rotation of different values of n. The first number shows the range for 8 bit number without rotation. The second row shows the range of values for n=1, where the rotation is in steps of four. Likewise is the next two rows with n=2 and n=3, with rotation in steps of 16 and 64.

Immediate = (0 → 255) × 22n where 0<=n<=12

0 – 255 [0 – 0xff]
256,260,264,..,1020 [0x100-0x3fc, step 4, 0x40-0xff ror 30]
1024,1040,1056,..,4080 [0x400-0xff0, step 16, 0x40-0xff ror 28]

–   4096,4160, 4224,..,16320[0x1000-0x3fc0, step 64, 0x40-0xff ror 26]

 

Following are few examples for loading immediate value. For example consider the immediate value of 4096(decimal) or 1000(Hex):

 

–     MOV r0, #0x40,26                            ; => MOV r0, #0x1000    (ie 4096)

 

The assembler will convert the required constant of hex or decimal into the rotate format:

 

–     MOV r0, #4096                  ; => MOV r0, #0x1000    (ie 0x40 ror 26)

 

On a similar concept, the bitwise complements can also be formed using MVN as shown in the example below:

 

–     MOV r0, #0xFFFFFFFF    ; assembles to MVN r0, #0

 

In case the required constant cannot be generated, an error will be reported by the processor.

 

To summarize ARM instruction cannot load a 32-bit immediate constant directly into a register. ARM instructions are encoded using 32-bits and only 12 bits are allocated for immediate values. The data processing instruction format has 12 bits available for operand 2. If used directly, this would only give a range of 4096. Instead it is used to store 8-bit constants, giving a range of 0-255. These 8 bits can then be rotated right through an even number of positions. This gives a much larger range of constants that can be directly loaded, though some constants will still need to be loaded from memory.

 

1.5.6.1 Logical and Arithmetic shift operands

 

Before executing an instruction in the ALU, ARM has the capability to preprocess the data in one of the operand with the help of barrel shifter. In general each instruction is executed in a single cycle. Consider the ADD example given below, where the r1 register is shifted 3 bits left prior to addition.

 

ADD r3,r2,r1,LSL#3          ; r3 := r2 + 8 * r1

 

The barrel shifter implements five types of shift operations shown in Figure 1.5. The LSL: Logical Shift Left shift bits left by 0 to 31 places, 0 filled at the LSB end. On similar lines we have LSR, ASL (Arithmetic Shift Left), ASR, ROR (Rotate Right), RRX (Rotate Right eXtended by 1 place). Consider the two examples of LSL and ROR where a register is used to define the amount of shift. Each left shift is equivalent to multiplying the operand by 2. Hence the r3,LSL r2 (left shift r2 times)is equivalent tor3*2r2.

 

ADD r5,r5,r3,LSL r2 ; r5:=r5+r3*2r2

 

MOV r12,r4,ROR r3         ;r12:=r4 rotated right by value of r3

 

1.5.6.2 Barrel Shifter

 

The second operand is the one that passes through the barrel shifter. Figure 1.6 shows the connection of barrel shifter and ALU in ARM processor. The barrel shifter has the ability to shift 32bits right or left by specific number of positions. Register uses the shift operation optionally. The Shift value can be either5-bit unsigned integer or specified in bottom byte of another register. The shift operation is also used for multiplication by constant. We had earlier discussed about immediate value, where the 8-bit number, with a range of 0 – 255 is rotated right through even number of positions. This preprocessing allows increased range of 32-bit constants that can be loaded directly into registers. Thus barrel shifter brings in a unique and powerful feature for ARM processor.

1.5.7     Multiply Instructions

 

 

The multiply instruction multiplies the pair of registersRm,Rs(MUL) and occasionally accumulates the product with register Rn (MLA). The final result is stored in register or a pair of registers. The syntax of both the instructions is given below:

 

–   MUL{<cond>}{S} Rd,Rm,Rs

 

–   MLA{<cond>}{S} Rd,Rm,Rs,Rn

 

MUL r4,r3,r2; r4:=(r3*r2)[31:0]
MLA r4,r3,r2,r1; r4:=((r3*r2)+r1)[31:0]

 

Table 1.6 shows the six types of multiplication instructions. The syntax for 64-bit multiplication with signed and unsigned register is listed below:

 

–   <mul>{<cond>}{S} RdHi,RdLo,Rm,Rs

 

–   <mul> is UMULL,UMLAL,SMULL,SMLAL

Table 1.5Multiply Instructions
Opco de Mnemo ni c Meani ng Effect
[2 3 :2 1 ]
000 MUL Multiply (32-bit result) Rd := (Rm * Rs) [31:0]
001 MLA Multiply-accumulate (32-bit result) Rd := (Rm * Rs + Rn) [31:0]
100 UMULL Unsigned multiply long RdHi:RdLo := Rm * Rs
101 UMLAL Unsigned multiply-accumulate long RdHi:RdLo += Rm * Rs
110 SMULL Signed multiply long RdHi:RdLo := Rm * Rs
111 SMLAL Signed multiply-accumulate long RdHi:RdLo += Rm * Rs

 

Booth’s algorithm is used to perform integer multiplication. Booth’s recoding basically looks into the transition of 1 to 0 and 0 to 1 in the binary pattern of the register (multiplier). ARM7TDMI Mul will execute in minimum of 2 clock cycles and maximum of 5 clock cycles. There are a few restrictions on the use of registers for this operation:

 

1. Rd and Rm cannot be the same register (can be avoided by swapping over Rm and Rs – multiplication is commutative)

 

2. Cannot use PC Multiplication by a constant equal to a ((power of 2) +/- 1) can be done in a single cycleusing MOV, ADD or RSB with an inline shift. We can also combine several instructions to carry out other multiplications. The table 1.7 shows the different variants of MUL implementation.

 

 

1.6 Summary

 

 

In this lecture we have discussed ARM Instructions set format and classification of instruction set. A detailed discussion on data processing instruction with examples and syntax was explored. We studied the impact made by barrel shifter on data pre-processing.

 

 

2. References

  1. www.arm.com
  2. en.wikipedia.org/wiki/ARM_architecture
  3. Andrew N Sloss, D. Symes, C. Wright, ” Arm system developers guide”, Morgan Kauffman/ Elsevier, 2006.