28 ARM instruction Set 2

Dr Selvi Ravindran

About the module:

 

This modulediscusses the classification of ARM instruction set. Among the different types of instructions, thismodule explains the data transfer and control flow instructions in detail. In general, 30% of the instructions in any program performbranching operation. This chapter explains branch, load store instruction and the addressing modes in the ARM instruction set.

 

Learning Outcomes:

 

  • Able to understand the nuances of ARM programming.
  • Should be able to write an ARM program incorporating the versatile looping structures and data transfer instructions.

 

1.1 ARM Instruction set classification

 

ARM instructions are classified as Data processing instructions, Control flow instructions(Branch instruction), Data transfer instructions (Load-store instruction, Swap instruction), Software Interrupt instruction, Program status register instruction and Coprocessor instruction. In the following section, we will examine the function and syntax of data transfer and control instructions.

 

1.2 Control Flow Instruction

 

Ideally an instruction in a program executes sequentially. But, there may be a necessity of executing a certain segment of a program repeatedly or skip a segment for valid reason. A branch instruction facilitates this requirement of the program. A branch changes the course of execution. It is used to implement high level language function calls, if-else statements and looping structures.

 

The branch operation manipulates the program counter (PC) value to change the flow of execution. The PC holds the address of the next instruction to be executed. The control flow trivially has two addresses to go for: (i) Target address,and (ii) Fall through address. If the instruction has choices to decide the PC value, then that is a conditional branch instruction. If the instruction has only the target address to go to, then it is an unconditional branch instruction.

The syntax of the branch instruction is:

Branch: B{<cond>} Label

Branch with Link: BL{<cond>} subroutine_label

 

The Table 1.1 shows the four branch operations with their mnemonics and explanations. The first instruction is an unconditional/conditional branch similar to that of GOTO/IF-ELSE statement in C language. The instruction changes the PC to point to the target location specified in the label. The sequence of execution is altered as per the label.

 

Table 1.1Branch Instructions

Mnemonic Control Flow Comment
B pc=label branch
BL pc=label lr = next address after BL branch with link
BX pc= Rm & 0xfffffffe, T=Rm & 1 branch exchange
BLX pc= Rm & 0xfffffffe,T=Rm & 1 lr = next address after BL branch exchange with link ;

 

 

Figur1.1 shows the machine code bit layout for the branch instruction. Four MSBs are allocated for the condition, next three bits represent the branch operation, 24th bit stands for link and the remaining 24 bits store the label as a signed pc-relative offset.

 

 

The PC-relative offset for branch instructions is calculated by:

 

a)  Taking the difference between the branch instruction and the target address minus 8 (to allow for the pipeline).

 

b) This gives a 26 bit offset which is right shifted 2 bits (as the bottom two bits are always zero as instructions are word-aligned) and stored into the instruction encoding.

 

c) This gives a range of +/- 32Mbytes.

 

The branch has a condition associated with it and is executed based on the validation byCurrent program status register(CPSR). Table 1.2shows the different conditions appended to the branch operation. If the condition code matches the corresponding status flag, the branch is taken, else untaken. Following is the code snippet describing a Branch Not Equal (BNE) instruction. The control flows to the label Loop whenever the comparison yields a non-zero result otherwise it executes the next instruction in sequence.

 

MOV r0,#0          ;initialize counter

 

Loop                      …

 

ADD r0,r0,#1      ;increment loop counter

 

CMP r0,#10        ;compare with limit

 

BL is a subroutine call instruction where the return address is written in the link register. On execution of BL, the subroutine address is loaded into the PC and it starts executing from the target location. At the end of the subroutine execution, the return takes place by copying the address from the link register to the PC. Following is the code snippet example showing the BL instruction calling SUB. The last instruction in SUB is a MOV instruction to update the PC.

 

Branch with Link and eXchange:

 

The next two types of branch instructions are Branch exchange and Branch link exchange. The syntax of the instruction is B{L}X{<cond>} Rm . In branch exchange, the target address is specified in a register, Rm. Bit[0] of Rm is copied into the T bit in CPSR; bit[31:1] is moved into PC. If Rm[0] is 1, the processor switches to execute Thumb instructions and begins executing at the address in Rm aligned to a half-word boundary by clearing the bottom bit. If Rm[0] is 0, the processor continues executing ARM instructions and begins executing at the address in Rm aligned to a word boundary by clearing Rm[1].

 

Branch link exchange syntax is BLX <target address>. The instruction calls Thumb subroutine from ARM. The L bit (bit 24) is also added into bit 1 of the resulting addressing, selecting an odd half-word address for the target instruction which will always be a Thumb. Table 1.3 shows the four branching examples: Unconditional jump, Conditional jump, Subroutine call, and Conditional subroutine call.

 

 

1.3 Initializing an Address Pointer

 

 

ARM assembler has a “pseudo” instruction, ADR. It is a macro that loads an address into a register(Rd )which is a relativePC value. A small offset is added to the program counter(r15). The syntax of ADR instruction is:

ADR{L} <cond> Rd, <address>

 

Here is an example, a program that copies data from TABLE1 to TABLE2, both of which are near to the caller code:

 

1.4 Data transfer Instruction

 

The data transfer between memory and registers is executed by load-store instruction. There are three basic forms to move data between ARM registers and memory: single register load store, multiple register load store and single register swap.

 

1.4.1     Single Data Transfer Instruction

 

Single register load and store instruction transfers signed and unsigned byte, (16-bit) half word and(32-bit) word. The syntax of the instruction is:

 

–   <LDR/STR>{<cond>}{word/Half word/Byte} Rd, address

 

The Table 1.2 shows the data types associated with load store operations with their mnemonics and explanation.

 

Table 1.2Single Register Load-Store Instructions

 

The data transfer boundary alignment is same as the datatype size.

 

Let us consider afew examples:

 

LDR ro, [r1] loads a word into r0 register from address pointed by register r1.

 

STR R0, [R1] stores the content of register R0 in the location pointed by address R1.

 

LDRSB / LDRSH, loads single byte/halfwordvalue and the sign is extended to 32 bits.

 

All of these can be conditionally executed by inserting the appropriate condition code after STR/LDR. For example LDREQB loads the byte data provided the zero flag is equal to one.

 

1.4.2     Single Data Transfer Addressing Mode

 

ARM instruction set provides different addressing modes to access memory. There are two types of addressing modes for single data transfer: Register-indirect addressing, and Base index addressing. The index addressing mode, consists of a Base-plus-offset combo. The Base register could be any one of the 16 registers in the set of r0 – r15. The offset is an unsigned numberadded or subtracted to/from the base. It could be provided as an immediate value, or given in a register (other than PC), or as a scaled register. The scaled indexing mode is available for word and unsigned byte instructions. There are two types of addressing mode for multiple data transfer: Block-copy addressing and Stack addressing. We will discuss this after understanding multiple data store instructions.

 

Register Indirect Addressing Mode

 

In this addressing mode, the register acts as a memory pointer. One of the operands stores the memory address. A register referred to as a base registercontains a value which is interpreted as a pointer to memory. For the load instruction, the data is read from the location pointed from this memory address; and in case of the store instruction, the data is written into the location address given in the base register. Following is an example of register indirect addressing mode, where the register r1 has memory address and the data in r0 is transferred to and fro based on the store (STR) and load (LDR) operations.

LDR r0,[r1] ;r0:=mem32[r1]

STR r0,[r1] ;mem32[r1]:=r0

 

Figure 1.2 shows the pictorial view of register indirect addressing mode for LDR r2, [r1] and STR r0, [r1] instructions. Both the instructions have a common base register r1 with hex value of 200. On execution of a store instruction STR r0, [r1], the copy of r0 register is stored in location pointed by base register. Hence the data 0x5 is written in location address 0x200. The LDR r2, [r1] instruction loads the register r2 with a copy of the address location pointed by r1 register. After execution r2 =05 as seen in Figure 1.2.

 

Base-plus-offset Addressing

 

Index addressing or Scaled index addressing has an additional offset value which is added to the base register to define the effective address. The data transfer takes place between the register and the effectiveaddress. Depending on the time when the offset is added to the base register, the base plus offset are categorized under three types: Pre-indexing, Post-indexing and auto indexing.

 

The Table 1.3 shows the three indexing operations with their mnemonics and explanation.

The data transfer boundary alignment is same as the datatype size.

 

The offset (#4) can be added or subtracted from/to the base register. Given that the immediate data are allocated 12 bits in the instruction, the offset can range up to 4K. The post indexing is equivalent to a simple register-indirect load,if we exclude the incremental offset. But post indexing is faster and is code efficient as it does two things in a single go. The last,Auto-indexing, updates the base register with the effective address. The additional time required to perform auto-indexing is pipelined with the data fetching operation from memory. Hence the indexing is time efficient.

 

Figure 1.3 shows the pictorial view of Pre-Index addressing mode for store instruction STR r0, [r1,#12]. The base register value 0x200 is added to the offset 12(0xC). The effective address adds up to be 0x20C. The r0 data 0x5 is later stored in the address location 0x20C. The pre-Index addressing can be used for various requirements:

 

To store to location 0x1f4, assuming r0 value as 0x200 we can use: STR r0, [r1,#-12]

 

To auto-increment base pointer to 0x20c we can execute: STR r0, [r1, #12]!

 

If r2 contains 3 we can access 0x20c by multiplying r2 by 4. This can be implemented by left shift operation. Every one bit left shift effectively multiplies the content by two. The following store operation performs 2 bits left shift thus making r2=12(3×4):

 

–   STR r0, [r1, r2, LSL #2]

 

Post-indexed Addressing

 

Example: STR r0, [r1], #12

 

 

Figure 1.4 shows the pictorial view of Post-Index addressing mode for store instruction STR r0, [r1],#12. First the r0 data is stored in the location pointed by the base register r1,then the r1 value 0x200 is added to the offset 12(0xC). The effective address adds up to be 0x20C.

 

1.4.3     Multiple Data Transfer Instruction

 

Multiple register load and store instruction enables reading and writing an array of data. A single instruction can be used to copy blocks of data between memory and processor. Apt for context switching, it can be used to save or restore workspace registers for procedure entry and exit. The load and store multiple instructions (LDM/STM) allow between 1 and 16 registers to be transferred to or from memory. The order of register transfer cannot be specified and the list mentioned in the instruction is insignificant with respect to who transfers first. The lowest register number is always transferred to/from the lowest memory location accessed. The transferred registers can be either any subset of the current bank of registers (default) oruser mode bank of registers when in a privileged mode. The syntax of the instruction is:

 

–   <LDM/STM>{<cond>}<addressing mode> Rn{!},<resisters>{^}

 

LDM/STM allows any subset (or all, r0 to r15) of the 16 registers to be transferred with a single instruction. For example,the register r0, r2 and r5 are loaded with data from memory location pointed by base register r1 as shown below:

 

LDMIA r1,{r0,r2,r5}          ;r0:=mem32[r1]

 

;r2:=mem32[r1+4]

 

;r5:=mem32[r1+8]

 

The Table 1.4 shows the direction that the base pointer moves through memory as post fixed to the STM/LDM instruction. The Base register determines which memory location is to be accessed. This indexing gives rise to four different addressing modes. Base register can optionally beupdated following the transfer using “!”.

 

Table 1.4Load-Store Multiple Instructions

 

Mnemonic Operation Comment
STMIA/LDMIA Rd-> mem32[address]/ Increment After
Rd<- mem32[address]
STMIB/LDMIB Rd-> mem32[address+4]/ Increment Before
Rd<- mem32[address+4]
STMDA/LDMDA Rd-> mem32[address]/ Decrement After
Rd<- mem32[address]
STMDB/LDMDB Rd-> mem32[address-4]/ Decrement Before
Rd<- mem32[address-4]

 

Block Copy

 

Copy a block of memory, which is an exact multiple of 12 words long, from the location pointed to by r12 to the location pointed to by r13. r14 points to the end of block to be copied.

 

;r12 points to the start of the source data

 

;  r14 points to the end of the source data

 

;  r13 points to the start of the destination data

loop   LDMIA r12!, {r0-r11} ; load 48 bytes
STMIA r13!, {r0-r11} ; and store them
CMP r12, r14 ; check for the end
BNE loop ; and loop until done

 

1.4.3.1 Multiple Data Transferfor Stack operation

 

The LDM and STM instructions are used to implement the stack operations pop and push. The stack classification and the load store instruction for its implementation is shown in Figure 1.5. A stack is ascending or descending depending on whether the stack grows up or down. An ascending stack increments towards higher memory address, while a descending stack decrements towards lower memory address. A stack is referred as Full stack when the stack pointer (sp)points to the last address or the full location. In contrast an Empty stack, the sp points to the first unused or empty location of the stack.

 

The stack type to be used is given by the postfix to the instructionas shown in the Table 1.5. For example, a Full ascending stack would have a notation FA appended to the load-store instruction(LDMFA/STMFA):

 

Table 1.5Load – Store for Stack operations

Mnemonic Comment
STMFD/LDMFD Full Descending stack
STMFA/LDMFA Full Ascending stack
STMED/LDMED Empty Descending stack
STMEA/LDMEA Empty Ascending stack

Note: ARM Compilers use a Full descending stack

 

Single register swap instruction

 

A Swap instruction enables an exchange between a value in a register and a value in memory. This is a useful instruction to implement semaphores and support mutual exclusion on accesses.This instruction was originally added by Acorn to use in a multiprocessor system.

 

The Syntax of swap instruction

 

–     SWP{<cond>}{B} Rd,Rm,[Rn]      ;Rd <- [Rn], [Rn] <- Rm

 

Consider the example:

 

ADR r0,SEMAPHORE

 

SWPB r1,r1,[r0]         ;exchange byte

 

Swap combines a load and a store of a word or an unsigned byte in a single instruction. It is used to move a byte or word quantity from a source register (Rm) to external memory, located by base register Rn. Stores the old memory contents in the destination register (Rd).Thus to implement an actual swap of contents make Rd = Rm.

 

1.5 Summary

 

In this lecture, we have gone through control flow instructions: Conditional and unconditional branches.We have discussed data transfer ARM Instructions and their classificationas Load/store /swap. A detailed discussion on data transfer instruction with examples and syntax was explored. We studied the addressing modes in ARM: Register indirect, Index and stack.

 

2. References

  1. www.arm.com
  2. en.wikipedia.org/wiki/ARM_architecture
  3. Andrew N Sloss, D. Symes, C. Wright, ” Arm system developers guide”, Morgan Kauffman/ Elsevier, 2006.