28 ARM instruction Set 2
Dr Selvi Ravindran
About the module:
This modulediscusses the classification of ARM instruction set. Among the different types of instructions, thismodule explains the data transfer and control flow instructions in detail. In general, 30% of the instructions in any program performbranching operation. This chapter explains branch, load store instruction and the addressing modes in the ARM instruction set.
Learning Outcomes:
- Able to understand the nuances of ARM programming.
- Should be able to write an ARM program incorporating the versatile looping structures and data transfer instructions.
1.1 ARM Instruction set classification
ARM instructions are classified as Data processing instructions, Control flow instructions(Branch instruction), Data transfer instructions (Load-store instruction, Swap instruction), Software Interrupt instruction, Program status register instruction and Coprocessor instruction. In the following section, we will examine the function and syntax of data transfer and control instructions.
1.2 Control Flow Instruction
Ideally an instruction in a program executes sequentially. But, there may be a necessity of executing a certain segment of a program repeatedly or skip a segment for valid reason. A branch instruction facilitates this requirement of the program. A branch changes the course of execution. It is used to implement high level language function calls, if-else statements and looping structures.
The branch operation manipulates the program counter (PC) value to change the flow of execution. The PC holds the address of the next instruction to be executed. The control flow trivially has two addresses to go for: (i) Target address,and (ii) Fall through address. If the instruction has choices to decide the PC value, then that is a conditional branch instruction. If the instruction has only the target address to go to, then it is an unconditional branch instruction.
The syntax of the branch instruction is:
Branch: B{<cond>} Label
Branch with Link: BL{<cond>} subroutine_label
The Table 1.1 shows the four branch operations with their mnemonics and explanations. The first instruction is an unconditional/conditional branch similar to that of GOTO/IF-ELSE statement in C language. The instruction changes the PC to point to the target location specified in the label. The sequence of execution is altered as per the label.
Table 1.1Branch Instructions
Mnemonic | Control Flow | Comment |
B | pc=label | branch |
BL | pc=label lr = next address after BL | branch with link |
BX | pc= Rm & 0xfffffffe, T=Rm & 1 | branch exchange |
BLX | pc= Rm & 0xfffffffe,T=Rm & 1 lr = next address after BL | branch exchange with link ; |
Figur1.1 shows the machine code bit layout for the branch instruction. Four MSBs are allocated for the condition, next three bits represent the branch operation, 24th bit stands for link and the remaining 24 bits store the label as a signed pc-relative offset.
The PC-relative offset for branch instructions is calculated by:
a) Taking the difference between the branch instruction and the target address minus 8 (to allow for the pipeline).
b) This gives a 26 bit offset which is right shifted 2 bits (as the bottom two bits are always zero as instructions are word-aligned) and stored into the instruction encoding.
c) This gives a range of +/- 32Mbytes.
The branch has a condition associated with it and is executed based on the validation byCurrent program status register(CPSR). Table 1.2shows the different conditions appended to the branch operation. If the condition code matches the corresponding status flag, the branch is taken, else untaken. Following is the code snippet describing a Branch Not Equal (BNE) instruction. The control flows to the label Loop whenever the comparison yields a non-zero result otherwise it executes the next instruction in sequence.
MOV r0,#0 ;initialize counter
Loop …
ADD r0,r0,#1 ;increment loop counter
CMP r0,#10 ;compare with limit
BL is a subroutine call instruction where the return address is written in the link register. On execution of BL, the subroutine address is loaded into the PC and it starts executing from the target location. At the end of the subroutine execution, the return takes place by copying the address from the link register to the PC. Following is the code snippet example showing the BL instruction calling SUB. The last instruction in SUB is a MOV instruction to update the PC.
Branch with Link and eXchange:
The next two types of branch instructions are Branch exchange and Branch link exchange. The syntax of the instruction is B{L}X{<cond>} Rm . In branch exchange, the target address is specified in a register, Rm. Bit[0] of Rm is copied into the T bit in CPSR; bit[31:1] is moved into PC. If Rm[0] is 1, the processor switches to execute Thumb instructions and begins executing at the address in Rm aligned to a half-word boundary by clearing the bottom bit. If Rm[0] is 0, the processor continues executing ARM instructions and begins executing at the address in Rm aligned to a word boundary by clearing Rm[1].
Branch link exchange syntax is BLX <target address>. The instruction calls Thumb subroutine from ARM. The L bit (bit 24) is also added into bit 1 of the resulting addressing, selecting an odd half-word address for the target instruction which will always be a Thumb. Table 1.3 shows the four branching examples: Unconditional jump, Conditional jump, Subroutine call, and Conditional subroutine call.
1.3 Initializing an Address Pointer
ARM assembler has a “pseudo” instruction, ADR. It is a macro that loads an address into a register(Rd )which is a relativePC value. A small offset is added to the program counter(r15). The syntax of ADR instruction is:
ADR{L} <cond> Rd, <address>
Here is an example, a program that copies data from TABLE1 to TABLE2, both of which are near to the caller code:
1.4 Data transfer Instruction
The data transfer between memory and registers is executed by load-store instruction. There are three basic forms to move data between ARM registers and memory: single register load store, multiple register load store and single register swap.
1.4.1 Single Data Transfer Instruction
Single register load and store instruction transfers signed and unsigned byte, (16-bit) half word and(32-bit) word. The syntax of the instruction is:
– <LDR/STR>{<cond>}{word/Half word/Byte} Rd, address
The Table 1.2 shows the data types associated with load store operations with their mnemonics and explanation.
Table 1.2Single Register Load-Store Instructions
The data transfer boundary alignment is same as the datatype size.
Let us consider afew examples:
LDR ro, [r1] loads a word into r0 register from address pointed by register r1.
STR R0, [R1] stores the content of register R0 in the location pointed by address R1.
LDRSB / LDRSH, loads single byte/halfwordvalue and the sign is extended to 32 bits.
All of these can be conditionally executed by inserting the appropriate condition code after STR/LDR. For example LDREQB loads the byte data provided the zero flag is equal to one.
1.4.2 Single Data Transfer Addressing Mode
ARM instruction set provides different addressing modes to access memory. There are two types of addressing modes for single data transfer: Register-indirect addressing, and Base index addressing. The index addressing mode, consists of a Base-plus-offset combo. The Base register could be any one of the 16 registers in the set of r0 – r15. The offset is an unsigned numberadded or subtracted to/from the base. It could be provided as an immediate value, or given in a register (other than PC), or as a scaled register. The scaled indexing mode is available for word and unsigned byte instructions. There are two types of addressing mode for multiple data transfer: Block-copy addressing and Stack addressing. We will discuss this after understanding multiple data store instructions.
Register Indirect Addressing Mode
In this addressing mode, the register acts as a memory pointer. One of the operands stores the memory address. A register referred to as a base registercontains a value which is interpreted as a pointer to memory. For the load instruction, the data is read from the location pointed from this memory address; and in case of the store instruction, the data is written into the location address given in the base register. Following is an example of register indirect addressing mode, where the register r1 has memory address and the data in r0 is transferred to and fro based on the store (STR) and load (LDR) operations.
LDR r0,[r1] ;r0:=mem32[r1]
STR r0,[r1] ;mem32[r1]:=r0
Figure 1.2 shows the pictorial view of register indirect addressing mode for LDR r2, [r1] and STR r0, [r1] instructions. Both the instructions have a common base register r1 with hex value of 200. On execution of a store instruction STR r0, [r1], the copy of r0 register is stored in location pointed by base register. Hence the data 0x5 is written in location address 0x200. The LDR r2, [r1] instruction loads the register r2 with a copy of the address location pointed by r1 register. After execution r2 =05 as seen in Figure 1.2.
Base-plus-offset Addressing
Index addressing or Scaled index addressing has an additional offset value which is added to the base register to define the effective address. The data transfer takes place between the register and the effectiveaddress. Depending on the time when the offset is added to the base register, the base plus offset are categorized under three types: Pre-indexing, Post-indexing and auto indexing.
The Table 1.3 shows the three indexing operations with their mnemonics and explanation.
The data transfer boundary alignment is same as the datatype size.
The offset (#4) can be added or subtracted from/to the base register. Given that the immediate data are allocated 12 bits in the instruction, the offset can range up to 4K. The post indexing is equivalent to a simple register-indirect load,if we exclude the incremental offset. But post indexing is faster and is code efficient as it does two things in a single go. The last,Auto-indexing, updates the base register with the effective address. The additional time required to perform auto-indexing is pipelined with the data fetching operation from memory. Hence the indexing is time efficient.
Figure 1.3 shows the pictorial view of Pre-Index addressing mode for store instruction STR r0, [r1,#12]. The base register value 0x200 is added to the offset 12(0xC). The effective address adds up to be 0x20C. The r0 data 0x5 is later stored in the address location 0x20C. The pre-Index addressing can be used for various requirements:
To store to location 0x1f4, assuming r0 value as 0x200 we can use: STR r0, [r1,#-12]
To auto-increment base pointer to 0x20c we can execute: STR r0, [r1, #12]!
If r2 contains 3 we can access 0x20c by multiplying r2 by 4. This can be implemented by left shift operation. Every one bit left shift effectively multiplies the content by two. The following store operation performs 2 bits left shift thus making r2=12(3×4):
– STR r0, [r1, r2, LSL #2]
Post-indexed Addressing
Example: STR r0, [r1], #12
Figure 1.4 shows the pictorial view of Post-Index addressing mode for store instruction STR r0, [r1],#12. First the r0 data is stored in the location pointed by the base register r1,then the r1 value 0x200 is added to the offset 12(0xC). The effective address adds up to be 0x20C.
1.4.3 Multiple Data Transfer Instruction
Multiple register load and store instruction enables reading and writing an array of data. A single instruction can be used to copy blocks of data between memory and processor. Apt for context switching, it can be used to save or restore workspace registers for procedure entry and exit. The load and store multiple instructions (LDM/STM) allow between 1 and 16 registers to be transferred to or from memory. The order of register transfer cannot be specified and the list mentioned in the instruction is insignificant with respect to who transfers first. The lowest register number is always transferred to/from the lowest memory location accessed. The transferred registers can be either any subset of the current bank of registers (default) oruser mode bank of registers when in a privileged mode. The syntax of the instruction is:
– <LDM/STM>{<cond>}<addressing mode> Rn{!},<resisters>{^}
LDM/STM allows any subset (or all, r0 to r15) of the 16 registers to be transferred with a single instruction. For example,the register r0, r2 and r5 are loaded with data from memory location pointed by base register r1 as shown below:
LDMIA r1,{r0,r2,r5} ;r0:=mem32[r1]
;r2:=mem32[r1+4]
;r5:=mem32[r1+8]
The Table 1.4 shows the direction that the base pointer moves through memory as post fixed to the STM/LDM instruction. The Base register determines which memory location is to be accessed. This indexing gives rise to four different addressing modes. Base register can optionally beupdated following the transfer using “!”.
Table 1.4Load-Store Multiple Instructions
Mnemonic | Operation | Comment |
STMIA/LDMIA | Rd-> mem32[address]/ | Increment After |
Rd<- mem32[address] | ||
STMIB/LDMIB | Rd-> mem32[address+4]/ | Increment Before |
Rd<- mem32[address+4] | ||
STMDA/LDMDA | Rd-> mem32[address]/ | Decrement After |
Rd<- mem32[address] | ||
STMDB/LDMDB | Rd-> mem32[address-4]/ | Decrement Before |
Rd<- mem32[address-4] |
Block Copy
Copy a block of memory, which is an exact multiple of 12 words long, from the location pointed to by r12 to the location pointed to by r13. r14 points to the end of block to be copied.
;r12 points to the start of the source data
; r14 points to the end of the source data
; r13 points to the start of the destination data
loop LDMIA | r12!, {r0-r11} | ; load 48 bytes |
STMIA | r13!, {r0-r11} | ; and store them |
CMP | r12, r14 ; | check for the end |
BNE | loop | ; and loop until done |
1.4.3.1 Multiple Data Transferfor Stack operation
The LDM and STM instructions are used to implement the stack operations pop and push. The stack classification and the load store instruction for its implementation is shown in Figure 1.5. A stack is ascending or descending depending on whether the stack grows up or down. An ascending stack increments towards higher memory address, while a descending stack decrements towards lower memory address. A stack is referred as Full stack when the stack pointer (sp)points to the last address or the full location. In contrast an Empty stack, the sp points to the first unused or empty location of the stack.
The stack type to be used is given by the postfix to the instructionas shown in the Table 1.5. For example, a Full ascending stack would have a notation FA appended to the load-store instruction(LDMFA/STMFA):
Table 1.5Load – Store for Stack operations
Mnemonic | Comment |
STMFD/LDMFD | Full Descending stack |
STMFA/LDMFA | Full Ascending stack |
STMED/LDMED | Empty Descending stack |
STMEA/LDMEA | Empty Ascending stack |
Note: ARM Compilers use a Full descending stack
Single register swap instruction
A Swap instruction enables an exchange between a value in a register and a value in memory. This is a useful instruction to implement semaphores and support mutual exclusion on accesses.This instruction was originally added by Acorn to use in a multiprocessor system.
The Syntax of swap instruction
– SWP{<cond>}{B} Rd,Rm,[Rn] ;Rd <- [Rn], [Rn] <- Rm
Consider the example:
ADR r0,SEMAPHORE
SWPB r1,r1,[r0] ;exchange byte
Swap combines a load and a store of a word or an unsigned byte in a single instruction. It is used to move a byte or word quantity from a source register (Rm) to external memory, located by base register Rn. Stores the old memory contents in the destination register (Rd).Thus to implement an actual swap of contents make Rd = Rm.
1.5 Summary
In this lecture, we have gone through control flow instructions: Conditional and unconditional branches.We have discussed data transfer ARM Instructions and their classificationas Load/store /swap. A detailed discussion on data transfer instruction with examples and syntax was explored. We studied the addressing modes in ARM: Register indirect, Index and stack.
2. References
- www.arm.com
- en.wikipedia.org/wiki/ARM_architecture
- Andrew N Sloss, D. Symes, C. Wright, ” Arm system developers guide”, Morgan Kauffman/ Elsevier, 2006.