Download Lecture 3

Document related concepts
no text concepts found
Transcript
INTRODUCTION



ARM is a RISC processor.
It is used for small size and high
performance applications.
Simple architecture – low power
consumption.
ARM
System - On - Chip
Architecture
2
TIMELINE (1/2)




1985: Acorn Computer Group manufactures the
first commercial RISC microprocessor.
1990: Acorn and Apple participation leads to the
founding of Advanced RISC Machines (A.R.M.).
1991: ARM6, First embeddable RISC
microprocessor.
1992 – 1994: Various companies use ARM (Sharp,
Samsung), while in 1993 ARM7, the first
multimedia microprocessor is introduced.
ARM
System - On - Chip
Architecture
3
TIMELINE (2/2)



1995: Introduction of Thumb and ARM8.
1996 – 2000: Alcatel, Huindai, Philips, Sony, use
ΑRM, while in 1999 η ARM cooperates with Erickson
for the development of Bluetooth.
2000 – 2002: ARM’s share of the 32 – bit embedded
RISC microprocessor market is 80%. ARM Developer
Suite is introduced.
ARM
System - On - Chip
Architecture
4
THE ARM
ARCHITECTURE
GENERAL INFO (1/2)
AIM: Simple design



Load – store architecture
32 bit data bus
3 addressing modes
ARM
System - On - Chip
Architecture
6
GENERAL INFO (2/2)
Simple architecture
+
Simple instruction set
+
Code density
ARM
Small size
Low power
consumption
System - On - Chip
Architecture
7
Registers



32 general purpose registers
7 modes of operation
Different set of visible registers and
different cpsr control level in each
mode.
ARM
System - On - Chip
Architecture
8
ARM Programming Model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
CPSR
user mode
usable in user mode
system modes only
r8_fiq
r9_fiq
r10_fiq
r11_fiq
r12_fiq
r13_fiq
r14_fiq
SPSR_fiq
fiq
mode
r13_svc
r14_svc
r13_abt
r14_abt
SPSR_svc
SPSR_abt
svc
mode
abort
mode
r13_irq
r14_irq
r13_und
r14_und
SPSR_irq SPSR_und
irq
mode
undefined
mode
CPSR
ARM CPSR format
31
28 27
N ZC V
8 7 6 5 4
unused
IF T
0
mode
N: Negative
Z: Zero
C: Carry
V: Overflow
Q: Saturation (for enhanced DSP instructions)
ARM
System - On - Chip
Architecture
10
Memory Organization
bi t 31
bi t 0
23
22
21
20
19
18
17
16
word16
15
14
13

Address bus: 32 – bits

1 word = 32 – bits
12
half-word14 half-word12
11
10
9
8
5
4
word8
7
6
byte6 half-word4
3
2
1
byte
address
0
byte3 byte2 byte1 byte0
ARM
System - On - Chip
Architecture
11
Instruction Set

Three instruction types



Data processing
Data transfer
Control flow
ARM
System - On - Chip
Architecture
12
Supervisor mode


In user mode the operating system handles
operations outside user privileges.
Using “supervisor calls”, the user goes to
system level and can perform system
functions.
ARM
System - On - Chip
Architecture
13
I/O System


ARM handles peripherals as “memory mapped
devices with interrupt support”.
Interrupts:


IRQ: normal interrupt
FIQ: fast interrupt
ARM
System - On - Chip
Architecture
14
Exceptions

Exceptions:




Interrupts
Supervisor Call
Traps
When an exception takes place:



The value of PC is copied to r14_exc
The operating mode changes into the respective
exception mode.
The PC takes the exception handler vector
address.
ARM
System - On - Chip
Architecture
15
ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
CPSR
user mode
usable in user mode
system modes only
r8_fiq
r9_fiq
r10_fiq
r11_fiq
r12_fiq
r13_fiq
r14_fiq
SPSR_fiq
fiq
mode
r13_svc
r14_svc
r13_abt
r14_abt
SPSR_svc
SPSR_abt
svc
mode
abort
mode
r13_irq
r14_irq
r13_und
r14_und
SPSR_irq SPSR_und
irq
mode
undefined
mode
THE ARM
INSTRUCTION SET
Data Processing Instructions (1/2)

Arithmetic Operations
ADD r0, r1, r2
; r0:= r1+r2 and don’t update flags
ADDS r0, r1, r2 ; r0:= r1+r2 and update flags

Logical Operations
AND r0, r1, r2

; r0:= r1 AND r2
Register Movement
MOV r0, r2

Comparison
CMP r1, r2
ARM
System - On - Chip
Architecture
18
Data Processing Instructions (2/2)

Operands:

Immediate operands
ADD r3, r3, #1

Shifted register operands:
ADD r3, r2, r1, LSL #3

Miscellaneous data processing instructions:

Multiplication:
MUL r4, r3, r2
ARM
System - On - Chip
Architecture
19
Data transfer instructions

Load and store instructions:
LDR r0, [r1]
STR r0, [r1]




Offset: LDR r0, [r1,#4]
Post – indexed: LDR r0, [r1], #16
Auto – indexed: LDR r0, [r1,#16]!
Multiple data transfers:
LDMIA r1, {r0,r2,r5}
ARM
System - On - Chip
Architecture
20
Examples

PRE:






r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
LDR r0, [r1, #4]!
POST:


r0 = 0x02020202
r1 = 0x00009004
ARM
System - On - Chip
Architecture
21
Examples

PRE:






r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
LDR r0, [r1, #4]
POST:


r0 = 0x02020202
r1 = 0x00009000
ARM
System - On - Chip
Architecture
22
Examples

PRE:






r0 = 0x00000000
r1 = 0x00009000
mem32[0x00009000] = 0x01010101
mem32[0x00009004] = 0x02020202
LDR r0, [r1], #4
POST:


r0 = 0x01010101
r1 = 0x00009004
ARM
System - On - Chip
Architecture
23
Examples
mem32[0x80018] = 0x03
 mem32[0x80014] = 0x02
 mem32[0x80010] = 0x01
 r0 = 0x00080010
LDMIA r0!, {r1-r3}
 r0 = 0x0008001c
 r1 = 0x00000001
 r2 = 0x00000002
 r3 = 0x00000003

ARM
System - On - Chip
Architecture
24
Examples
mem32[0x8001c] = 0x04
 mem32[0x80018] = 0x03
 mem32[0x80014] = 0x02
 mem32[0x80010] = 0x01
 r0 = 0x00080010
LDMIB r0!, {r1-r3}
 r0 = 0x0008001c
 r1 = 0x00000002
 r2 = 0x00000003
 r3 = 0x00000004

ARM
System - On - Chip
Architecture
25
Conditional execution
Instructions can be executed
conditionally without braches
CMP r2, r3 ;subtract and set flags
ADDGE r4, r5, r6 ; if r2>r3
SUBLT r4, r5, r6 ; else

ARM
System - On - Chip
Architecture
26
Conditional execution mnemonics
ARM
System - On - Chip
Architecture
27
Control flow instructions



Branch instruction: B label
Conditional branch: BNE label
Branch and Link: BL label
BL
…
Loop …
…
loop
…
…
…
MOV PC, r14
ARM
; επιστροφή
System - On - Chip
Architecture
28
Example 1
AREA ARMex, CODE, READONLY ; Name this block of code ARMex
ENTRY
; Mark first instruction to execute
start
MOV r0, #10
; Set up parameters
MOV r1, #3
ADD r0, r0, r1
; r0 = r0 + r1
stop
MOV r0, #0x18
; angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0x123456
; ARM semihosting SWI
END
; Mark end of file
ARM
System - On - Chip
Architecture
29
Example 2
AREA subrout, CODE, READONLY ; Name this block of code
ENTRY
; Mark first instruction to execute
start MOV r0, #10
; Set up parameters
MOV r1, #3
BL doadd
; Call subroutine
stop
MOV r0, #0x18
; angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0x123456
; ARM semihosting SWI
doadd
ADD r0, r0, r1
; Subroutine code
MOV pc, lr
; Return from subroutine
END
; Mark end of file
ARM
System - On - Chip
Architecture
30
ARM ORGANIZATION AND
IMPLEMENTATION
3 – Stage
Pipeline
(ARM7 –
80MHz)




Fetch
Decode
Execute
A[31:0]
control
address regis ter
P
C
incrementer
PC
register
bank
instructi on
decode
A
L
U
b
u
s
multipl y
register
&
A
B
b
u
s
b
u
s
barrel
shifter
control
ALU
Throughput:
1 instruction / cycle
data out register
data in register
D[31:0]
5 – stage pipeline (1/2)


Program execution time:
N inst  CPI
Tprog 
f clk
Ways to reduce T prog:


Increase f clk
Logic simplification
Reduce CPI
reduce the number of
multicycle instructions.
ARM
System - On - Chip
Architecture
33
5 – stage
pipeline
(ARM9150MHz)
(2/2)





Fetch
Decode
Execute
Buffer / Data
Write - Back
ARM coprocessor interface




ARM supports upto 16 coprocessors, which
can be software emulated.
Each coprocessor has upto 16 generalpurpose registers
ARM is a load and store architecture.
Coprocessors usually handle on – chip
functions, such as cache and memory
management.
ARM
System - On - Chip
Architecture
35
ARCHITECTURAL SUPPORT FOR
HIGH – LEVEL LANGUAGES
Floating - point accelerator


(1/2)
For floating-point operations, ARM has the FPE
software emulator and the FPA 10 hardware floating
– point accelerator.
FPA 10 includes:




Coprocessor interface
Load / store unit
Register bank ( 8 registers 80 – bit )
ALU (adder, mult, div)
ARM
System - On - Chip
Architecture
37
Floating - point accelerator
(2/2)
da ta b us
pi peli ne
contro l
in structio n
is suer
lo ad/s tore
un it
coprocess or
ha nd-sh ake
coprocess or
in terfa ce
reg iste r ban k
add
mult
ari thme tic
un it
div
ARM
System - On - Chip
Architecture
38
APCS (1/2)



APCS (ARM Procedure Call Standard) is a set of
rules concerning C procedure input and output.
Specific use of general purpose registers. (r0 –
r4: arguments, r4 – r8 variables, r10 stack limit,
etc. )
Procedure I/O:
BL
Loop
…
Loop
…
MOV
pc, lr
ARM
System - On - Chip
Architecture
39
APCS (2/2)
C code
Assembly code
void f1(int a) {
f2(a);
}
16
8
f1 LDR r0, [r13]
STR r13!, [r14]
STR r13!, [r0]
BL f2
SUB r13,#4
LDR r13!, r15
4
0
Stack pointer
ARM
System - On - Chip
Architecture
40
THUMB PROGRAMMER’S
MODEL
General information




Thumb objective:
Code density.
Thumb has a 16 – bit instruction set.
A subset of the ARM instruction set is coded to a
16–bit space
With appropriate use great benefits can be
achieved in terms of


Power efficiency
Enhanced performance
ARM
System - On - Chip
Architecture
42
Going in and out of Thumb mode

Using the BX instruction, in ARM state:
e.g. ΒΧ r0



Commands are assembled as 16 – bit
instructions with the appropriate directive
If r0[0] is 1, the T bit in the CPSR becomes 1
and the PC is set to the address obtained from
the remaining bits of r0.
Using the BX instruction from Thumb state,
we return to ARM state.
ARM
System - On - Chip
Architecture
43
The Thumb programmer’s model

Thumb registers
r0
r1
r2
r3
r4
s haded registers have
restricted ac cess
Lo registers
r5
r6
r7
r8
r9
r10
r11
r12
SP (r13)
LR (r14)
PC (r15)
Hi registers
CPSR
ARM
System - On - Chip
Architecture
44
ARM vs. Thumb (1/3)

Thumb





Upto 70% code
size reduction
40% more
instructions.
45% faster code
with 16-bit
memory
Requires about
30% less external
memory
ARM
ARM

System - On - Chip
Architecture
40% faster code
when coupled with
a 32-bit memory
45
ARM vs. Thumb (2/3)

If performance is critical:
ARM

If cost and power consumption are
critical:
Thumb
ARM
System - On - Chip
Architecture
46
ARM and Τhumb interaction


A 32 – bit ARM system can go into Thumb mode
for specific routines, in order to meet power and
memory constraints.
A 16 – bit system: Can use an on – chip, 32 – bit
memory for ARM state routines, and a 16-bit off
– chip memory and Thumb code for the rest of
the application.
ARM
System - On - Chip
Architecture
47
Example 3
AREA ThumbSub, CODE, READONLY
; Name this block of code
ENTRY
; Mark first instruction to execute
CODE32
; Subsequent instructions are ARM
header ADR r0, start + 1
; Processor starts in ARM state,
BX r0 ; so small ARM code header used
; to call Thumb main program
CODE16
; Subsequent instructions are Thumb
start
MOV r0, #10
; Set up parameters
MOV r1, #3
BL doadd
; Call subroutine
stop
MOV r0, #0x18
;
angel_SWIreason_ReportException
LDR r1, =0x20026
; ADP_Stopped_ApplicationExit
SWI 0xAB
; Thumb semihosting SWI
doadd
ADD r0, r0, r1
; Subroutine code
MOV pc, lr
; Return from subroutine
END
; Mark end of file
ARM
System - On - Chip
Architecture
48
Example 4
Implement the following pseudocode in ARM
and Thumb assembly. Which is more efficient
in terms of execution time and which in terms
of code size?
If r1>r2 then
R3= r4 + r5
R6 = r4 – r5
Else
R3= r4 - r5
R6 = r4 + r5

ARM
System - On - Chip
Architecture
49
Example 5


Write an ARM assembly program that
loads data from memory location 0x40,
sets bits 3 to 5, clears bits 0 to 2 and
leaves the remaining bits unchanged.
Test it using 0xAD as input data
ARM
System - On - Chip
Architecture
50
ARCHITECTURAL SUPPORT
FOR SYSTEM
DEVELOPMENT
The ARM memory interface
A basic
ARM
memory
system
AMBA (1/4)

Advanced Microcontroller Bus Architecture






Advanced High – Performance Bus
Advanced System Bus
Advanced Peripheral Bus
AMBA objectives:
Technology – independence
To encourage modular system design
ARM
System - On - Chip
Architecture
53
AMBA (2/4)

A typical AMBA – based system
ARM
System - On - Chip
Architecture
54
AMBA (3/4)



AHB bus
Burst
transaction
Split
transaction
Data bus 64 –
128 bit
arbiter
address
master
1
slave
1
write
data

master
2
slave
2
master
3
slave
3
read
data
decoder
ARM
System - On - Chip
Architecture
55
AMBA (4/4)

AMBA Design Kit (ADK)

An environment that assists designers in developing
ΑΜΒΑ based components και SoC designs.
ARM
System - On - Chip
Architecture
56
Signal Processing Support


(1/2)
Piccolo DSP coprocessor.
Various data memories for maximizing
throughput.
ARM
System - On - Chip
Architecture
57
Signal Processing Support
Piccolo
ALU
mult
decode and control

(2/2)
I cache
ARM7TDMI
output
buffer
register
bank
input
buffer
AMBA i/f
AMBA i/f
AMBA
MEMORY HIERARCHY
Memory hierarchy
Larger size
Lower speed
Memory
type
Size
Speed
Registers
On – chip
cache
Off – chip
cache
RAM
32 – bit
8–
32kbytes
100 – 200
kbytes
Mbytes
A few nsec
10 nsec
ARM
10 – 30
nsec
100 nsec
System - On - Chip
Architecture
60
On – chip memory


Necessary for performance
Some system prefer RAM to on – chip
cache. Simpler, cheaper and less powerhungry.
ARM
System - On - Chip
Architecture
61
Cache types

Cache types:



Unified cache.
Separate instruction and data caches.
Performance: hit rate – miss rate
tav  htcache  (1  h)tmain



Compulsory miss: first time and address is accessed
Capacity miss: When cache full
Conflict miss: Two addresses compete for the same place in
the cache
ARM
System - On - Chip
Architecture
62
Replacement policy -implementation






Least Recently Used (LRU)
Least Frequently Used (LFU)
Data prediction
Fully-associative
Direct-mapped
Set-associative
ARM
System - On - Chip
Architecture
63
Direct – mapped cache

(1/2)
A line of
data
stored
in a tag
of
memory
ARM
System - On - Chip
Architecture
64
Direct – mapped cache



(2/2)
Each memory location has a specific
place in the cache.
Tag and data can be accessed at the
same time.
Tag RAM smaller than data RAM and
has a smaller access time allowing the
comparison to complete before
accessing the data RAM.
ARM
System - On - Chip
Architecture
65

2 – way set
– associative
cache. (1/3)
Set associative cache (2/3)


A set – associative cache has a number of
sets yielding n – way associative cache.
Two addresses that would be competing for
the same spot in a direct mapped cache, can
be stored in different locations and accessed
independently.
ARM
System - On - Chip
Architecture
67
Set associative (3/3)

Set selection:



Random allocation
Least recently used (LRU)
Round – robin (cyclic)
ARM
System - On - Chip
Architecture
68
Fully associative (1/2)
address
tag CAM
data RAM
mux
hit
data
Write strategies

Write – through
All write operations are passed to main memory

Write – through with buffered write
Write operations are passed to main memory
through the write buffer

Copy – back (write – back)
Write operations update only the cache.
ARM
System - On - Chip
Architecture
70
Cache feature summary
Org ani zati o nal feature
Cache-MMU rel ati o ns hi p
Cache co ntents
As s o ci ati v i ty
Repl acement s trat eg y
Wri te s trateg y
Physical cache
Unified instruction
and data cache
Direct-mapped
RAM-RAM
Cyclic
Write-through
ARM
Opti o ns
Virtual cache
Separate instruction
and data caches
Set-associative
RAM-RAM
Random
Write-through with
write buffer
System - On - Chip
Architecture
Fully associative
CAM-RAM
LRU
Copy-back
71
‘Perfect’ cache performance
Cache fo rm
No cache
Instruction-only cache
Instruction and data cache
Data-only cache
ARM
Perfo rmance
1
1.95
2.5
1.13
System - On - Chip
Architecture
72
MMU (1/3)


Two memory management approaches:
Segmentation
Paging
ARM
System - On - Chip
Architecture
73
MMU (2/3)

Segmented memory management:
segment selector
logical address
base
limit
segment descriptor table
+
>?
physical address
access fault
ARM
System - On - Chip
Architecture
74
MMU (3/3)

Paging memory management:
31
22 21
12 11
0
logical address
data
page
directory
page
table
ARM
System - On - Chip
Architecture
page
frame
75
ARCHITECTURAL SUPPORT
FOR OPERATING SYSTEMS
External
Clock
W'Dog
External
Reset &
Battery Fail
System
Control
14 External
Interrupts
Trace Port
Analyser
8 external DMA
requests
ETM
Timers
&
RTC
(PL031)
VIC
(PL192)
DMAC
(PL080)
AHB/APB
Bridge
64
64
64
64
1.
2.
3.
4.
5.
6.
7.
8.
config
64
64
64
64
MPMC
(PL176)
Static
Memory
SMC
(PL093)
unassigned
SDRAM
& DDR
CLCD
Display
CLCD
(PL110)
ARM1136JF
core
}
8 AHBs
Bus Matrix
config
1. ARM Periph AHB
2. ARM D Write AHB
3. ARM D Read AHB
4. ARM I AHB
5. ARM DMA AHB
6. CLCD AHB
7. DMA 2 AHB
8. DMA 1 AHB
AHB/APB
Bridge
GPIO
(PL061)
32 GPIO
Lines
AHB/APB
Bridge
SSP
(PL022)
UART
(PL011)
2x UARTs
SCI
(PL131)
Smart Card
(UICC
compliant)
CP15


On – chip coprocessor for MMU, cache,
protection unit control.
Control takes place through registers with
instructions executed in supervisor mode.
ARM
System - On - Chip
Architecture
77
Protection Unit


Simpler alternative to the MMU.
Requires simpler software and
hardware.
Does not use translation tables, but 8
protection regions instead.
ARM
System - On - Chip
Architecture
78
ARM DEVELOPER SUITE
ARMULATOR (1/2)



Armulator: Emulator of various ARM
processors.
Allows project development in C, C++
or Assembly.
It includes debugger, compilers,
assembler and this entire set is called
ARM Developer Suite (ADS).
ARM
System - On - Chip
Architecture
80
ARMULATOR (2/2)

Possible project options:




ARM and Thumb Interworking
Mixing C, C++ and Assembly
Code for ROM
Exception handlers
MM
ARM
System - On - Chip
Architecture
81
ARMULATOR TUTORIAL

CODEWARRIOR ENVIRONMENT
ARM
System - On - Chip
Architecture
82
ARM
System - On - Chip
Architecture
83
ARM
System - On - Chip
Architecture
84
ARM
System - On - Chip
Architecture
85
ARM
System - On - Chip
Architecture
86
ARM
System - On - Chip
Architecture
87
Related documents