Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INTRODUCTION ARM is a RISC processor. It is used for small size and high performance applications. Simple architecture – low power consumption. ARM System - On - Chip Architecture 2 TIMELINE (1/2) 1985: Acorn Computer Group manufactures the first commercial RISC microprocessor. 1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.). 1991: ARM6, First embeddable RISC microprocessor. 1992 – 1994: Various companies use ARM (Sharp, Samsung), while in 1993 ARM7, the first multimedia microprocessor is introduced. ARM System - On - Chip Architecture 3 TIMELINE (2/2) 1995: Introduction of Thumb and ARM8. 1996 – 2000: Alcatel, Huindai, Philips, Sony, use ΑRM, while in 1999 η ARM cooperates with Erickson for the development of Bluetooth. 2000 – 2002: ARM’s share of the 32 – bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced. ARM System - On - Chip Architecture 4 THE ARM ARCHITECTURE GENERAL INFO (1/2) AIM: Simple design Load – store architecture 32 bit data bus 3 addressing modes ARM System - On - Chip Architecture 6 GENERAL INFO (2/2) Simple architecture + Simple instruction set + Code density ARM Small size Low power consumption System - On - Chip Architecture 7 Registers 32 general purpose registers 7 modes of operation Different set of visible registers and different cpsr control level in each mode. ARM System - On - Chip Architecture 8 ARM Programming Model r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) CPSR user mode usable in user mode system modes only r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq SPSR_fiq fiq mode r13_svc r14_svc r13_abt r14_abt SPSR_svc SPSR_abt svc mode abort mode r13_irq r14_irq r13_und r14_und SPSR_irq SPSR_und irq mode undefined mode CPSR ARM CPSR format 31 28 27 N ZC V 8 7 6 5 4 unused IF T 0 mode N: Negative Z: Zero C: Carry V: Overflow Q: Saturation (for enhanced DSP instructions) ARM System - On - Chip Architecture 10 Memory Organization bi t 31 bi t 0 23 22 21 20 19 18 17 16 word16 15 14 13 Address bus: 32 – bits 1 word = 32 – bits 12 half-word14 half-word12 11 10 9 8 5 4 word8 7 6 byte6 half-word4 3 2 1 byte address 0 byte3 byte2 byte1 byte0 ARM System - On - Chip Architecture 11 Instruction Set Three instruction types Data processing Data transfer Control flow ARM System - On - Chip Architecture 12 Supervisor mode In user mode the operating system handles operations outside user privileges. Using “supervisor calls”, the user goes to system level and can perform system functions. ARM System - On - Chip Architecture 13 I/O System ARM handles peripherals as “memory mapped devices with interrupt support”. Interrupts: IRQ: normal interrupt FIQ: fast interrupt ARM System - On - Chip Architecture 14 Exceptions Exceptions: Interrupts Supervisor Call Traps When an exception takes place: The value of PC is copied to r14_exc The operating mode changes into the respective exception mode. The PC takes the exception handler vector address. ARM System - On - Chip Architecture 15 ARM programming model r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) CPSR user mode usable in user mode system modes only r8_fiq r9_fiq r10_fiq r11_fiq r12_fiq r13_fiq r14_fiq SPSR_fiq fiq mode r13_svc r14_svc r13_abt r14_abt SPSR_svc SPSR_abt svc mode abort mode r13_irq r14_irq r13_und r14_und SPSR_irq SPSR_und irq mode undefined mode THE ARM INSTRUCTION SET Data Processing Instructions (1/2) Arithmetic Operations ADD r0, r1, r2 ; r0:= r1+r2 and don’t update flags ADDS r0, r1, r2 ; r0:= r1+r2 and update flags Logical Operations AND r0, r1, r2 ; r0:= r1 AND r2 Register Movement MOV r0, r2 Comparison CMP r1, r2 ARM System - On - Chip Architecture 18 Data Processing Instructions (2/2) Operands: Immediate operands ADD r3, r3, #1 Shifted register operands: ADD r3, r2, r1, LSL #3 Miscellaneous data processing instructions: Multiplication: MUL r4, r3, r2 ARM System - On - Chip Architecture 19 Data transfer instructions Load and store instructions: LDR r0, [r1] STR r0, [r1] Offset: LDR r0, [r1,#4] Post – indexed: LDR r0, [r1], #16 Auto – indexed: LDR r0, [r1,#16]! Multiple data transfers: LDMIA r1, {r0,r2,r5} ARM System - On - Chip Architecture 20 Examples PRE: r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1, #4]! POST: r0 = 0x02020202 r1 = 0x00009004 ARM System - On - Chip Architecture 21 Examples PRE: r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1, #4] POST: r0 = 0x02020202 r1 = 0x00009000 ARM System - On - Chip Architecture 22 Examples PRE: r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202 LDR r0, [r1], #4 POST: r0 = 0x01010101 r1 = 0x00009004 ARM System - On - Chip Architecture 23 Examples mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIA r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002 r3 = 0x00000003 ARM System - On - Chip Architecture 24 Examples mem32[0x8001c] = 0x04 mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010 LDMIB r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000002 r2 = 0x00000003 r3 = 0x00000004 ARM System - On - Chip Architecture 25 Conditional execution Instructions can be executed conditionally without braches CMP r2, r3 ;subtract and set flags ADDGE r4, r5, r6 ; if r2>r3 SUBLT r4, r5, r6 ; else ARM System - On - Chip Architecture 26 Conditional execution mnemonics ARM System - On - Chip Architecture 27 Control flow instructions Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label BL … Loop … … loop … … … MOV PC, r14 ARM ; επιστροφή System - On - Chip Architecture 28 Example 1 AREA ARMex, CODE, READONLY ; Name this block of code ARMex ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 ADD r0, r0, r1 ; r0 = r0 + r1 stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI END ; Mark end of file ARM System - On - Chip Architecture 29 Example 2 AREA subrout, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0x123456 ; ARM semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file ARM System - On - Chip Architecture 30 ARM ORGANIZATION AND IMPLEMENTATION 3 – Stage Pipeline (ARM7 – 80MHz) Fetch Decode Execute A[31:0] control address regis ter P C incrementer PC register bank instructi on decode A L U b u s multipl y register & A B b u s b u s barrel shifter control ALU Throughput: 1 instruction / cycle data out register data in register D[31:0] 5 – stage pipeline (1/2) Program execution time: N inst CPI Tprog f clk Ways to reduce T prog: Increase f clk Logic simplification Reduce CPI reduce the number of multicycle instructions. ARM System - On - Chip Architecture 33 5 – stage pipeline (ARM9150MHz) (2/2) Fetch Decode Execute Buffer / Data Write - Back ARM coprocessor interface ARM supports upto 16 coprocessors, which can be software emulated. Each coprocessor has upto 16 generalpurpose registers ARM is a load and store architecture. Coprocessors usually handle on – chip functions, such as cache and memory management. ARM System - On - Chip Architecture 35 ARCHITECTURAL SUPPORT FOR HIGH – LEVEL LANGUAGES Floating - point accelerator (1/2) For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating – point accelerator. FPA 10 includes: Coprocessor interface Load / store unit Register bank ( 8 registers 80 – bit ) ALU (adder, mult, div) ARM System - On - Chip Architecture 37 Floating - point accelerator (2/2) da ta b us pi peli ne contro l in structio n is suer lo ad/s tore un it coprocess or ha nd-sh ake coprocess or in terfa ce reg iste r ban k add mult ari thme tic un it div ARM System - On - Chip Architecture 38 APCS (1/2) APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output. Specific use of general purpose registers. (r0 – r4: arguments, r4 – r8 variables, r10 stack limit, etc. ) Procedure I/O: BL Loop … Loop … MOV pc, lr ARM System - On - Chip Architecture 39 APCS (2/2) C code Assembly code void f1(int a) { f2(a); } 16 8 f1 LDR r0, [r13] STR r13!, [r14] STR r13!, [r0] BL f2 SUB r13,#4 LDR r13!, r15 4 0 Stack pointer ARM System - On - Chip Architecture 40 THUMB PROGRAMMER’S MODEL General information Thumb objective: Code density. Thumb has a 16 – bit instruction set. A subset of the ARM instruction set is coded to a 16–bit space With appropriate use great benefits can be achieved in terms of Power efficiency Enhanced performance ARM System - On - Chip Architecture 42 Going in and out of Thumb mode Using the BX instruction, in ARM state: e.g. ΒΧ r0 Commands are assembled as 16 – bit instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes 1 and the PC is set to the address obtained from the remaining bits of r0. Using the BX instruction from Thumb state, we return to ARM state. ARM System - On - Chip Architecture 43 The Thumb programmer’s model Thumb registers r0 r1 r2 r3 r4 s haded registers have restricted ac cess Lo registers r5 r6 r7 r8 r9 r10 r11 r12 SP (r13) LR (r14) PC (r15) Hi registers CPSR ARM System - On - Chip Architecture 44 ARM vs. Thumb (1/3) Thumb Upto 70% code size reduction 40% more instructions. 45% faster code with 16-bit memory Requires about 30% less external memory ARM ARM System - On - Chip Architecture 40% faster code when coupled with a 32-bit memory 45 ARM vs. Thumb (2/3) If performance is critical: ARM If cost and power consumption are critical: Thumb ARM System - On - Chip Architecture 46 ARM and Τhumb interaction A 32 – bit ARM system can go into Thumb mode for specific routines, in order to meet power and memory constraints. A 16 – bit system: Can use an on – chip, 32 – bit memory for ARM state routines, and a 16-bit off – chip memory and Thumb code for the rest of the application. ARM System - On - Chip Architecture 47 Example 3 AREA ThumbSub, CODE, READONLY ; Name this block of code ENTRY ; Mark first instruction to execute CODE32 ; Subsequent instructions are ARM header ADR r0, start + 1 ; Processor starts in ARM state, BX r0 ; so small ARM code header used ; to call Thumb main program CODE16 ; Subsequent instructions are Thumb start MOV r0, #10 ; Set up parameters MOV r1, #3 BL doadd ; Call subroutine stop MOV r0, #0x18 ; angel_SWIreason_ReportException LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit SWI 0xAB ; Thumb semihosting SWI doadd ADD r0, r0, r1 ; Subroutine code MOV pc, lr ; Return from subroutine END ; Mark end of file ARM System - On - Chip Architecture 48 Example 4 Implement the following pseudocode in ARM and Thumb assembly. Which is more efficient in terms of execution time and which in terms of code size? If r1>r2 then R3= r4 + r5 R6 = r4 – r5 Else R3= r4 - r5 R6 = r4 + r5 ARM System - On - Chip Architecture 49 Example 5 Write an ARM assembly program that loads data from memory location 0x40, sets bits 3 to 5, clears bits 0 to 2 and leaves the remaining bits unchanged. Test it using 0xAD as input data ARM System - On - Chip Architecture 50 ARCHITECTURAL SUPPORT FOR SYSTEM DEVELOPMENT The ARM memory interface A basic ARM memory system AMBA (1/4) Advanced Microcontroller Bus Architecture Advanced High – Performance Bus Advanced System Bus Advanced Peripheral Bus AMBA objectives: Technology – independence To encourage modular system design ARM System - On - Chip Architecture 53 AMBA (2/4) A typical AMBA – based system ARM System - On - Chip Architecture 54 AMBA (3/4) AHB bus Burst transaction Split transaction Data bus 64 – 128 bit arbiter address master 1 slave 1 write data master 2 slave 2 master 3 slave 3 read data decoder ARM System - On - Chip Architecture 55 AMBA (4/4) AMBA Design Kit (ADK) An environment that assists designers in developing ΑΜΒΑ based components και SoC designs. ARM System - On - Chip Architecture 56 Signal Processing Support (1/2) Piccolo DSP coprocessor. Various data memories for maximizing throughput. ARM System - On - Chip Architecture 57 Signal Processing Support Piccolo ALU mult decode and control (2/2) I cache ARM7TDMI output buffer register bank input buffer AMBA i/f AMBA i/f AMBA MEMORY HIERARCHY Memory hierarchy Larger size Lower speed Memory type Size Speed Registers On – chip cache Off – chip cache RAM 32 – bit 8– 32kbytes 100 – 200 kbytes Mbytes A few nsec 10 nsec ARM 10 – 30 nsec 100 nsec System - On - Chip Architecture 60 On – chip memory Necessary for performance Some system prefer RAM to on – chip cache. Simpler, cheaper and less powerhungry. ARM System - On - Chip Architecture 61 Cache types Cache types: Unified cache. Separate instruction and data caches. Performance: hit rate – miss rate tav htcache (1 h)tmain Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same place in the cache ARM System - On - Chip Architecture 62 Replacement policy -implementation Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction Fully-associative Direct-mapped Set-associative ARM System - On - Chip Architecture 63 Direct – mapped cache (1/2) A line of data stored in a tag of memory ARM System - On - Chip Architecture 64 Direct – mapped cache (2/2) Each memory location has a specific place in the cache. Tag and data can be accessed at the same time. Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM. ARM System - On - Chip Architecture 65 2 – way set – associative cache. (1/3) Set associative cache (2/3) A set – associative cache has a number of sets yielding n – way associative cache. Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently. ARM System - On - Chip Architecture 67 Set associative (3/3) Set selection: Random allocation Least recently used (LRU) Round – robin (cyclic) ARM System - On - Chip Architecture 68 Fully associative (1/2) address tag CAM data RAM mux hit data Write strategies Write – through All write operations are passed to main memory Write – through with buffered write Write operations are passed to main memory through the write buffer Copy – back (write – back) Write operations update only the cache. ARM System - On - Chip Architecture 70 Cache feature summary Org ani zati o nal feature Cache-MMU rel ati o ns hi p Cache co ntents As s o ci ati v i ty Repl acement s trat eg y Wri te s trateg y Physical cache Unified instruction and data cache Direct-mapped RAM-RAM Cyclic Write-through ARM Opti o ns Virtual cache Separate instruction and data caches Set-associative RAM-RAM Random Write-through with write buffer System - On - Chip Architecture Fully associative CAM-RAM LRU Copy-back 71 ‘Perfect’ cache performance Cache fo rm No cache Instruction-only cache Instruction and data cache Data-only cache ARM Perfo rmance 1 1.95 2.5 1.13 System - On - Chip Architecture 72 MMU (1/3) Two memory management approaches: Segmentation Paging ARM System - On - Chip Architecture 73 MMU (2/3) Segmented memory management: segment selector logical address base limit segment descriptor table + >? physical address access fault ARM System - On - Chip Architecture 74 MMU (3/3) Paging memory management: 31 22 21 12 11 0 logical address data page directory page table ARM System - On - Chip Architecture page frame 75 ARCHITECTURAL SUPPORT FOR OPERATING SYSTEMS External Clock W'Dog External Reset & Battery Fail System Control 14 External Interrupts Trace Port Analyser 8 external DMA requests ETM Timers & RTC (PL031) VIC (PL192) DMAC (PL080) AHB/APB Bridge 64 64 64 64 1. 2. 3. 4. 5. 6. 7. 8. config 64 64 64 64 MPMC (PL176) Static Memory SMC (PL093) unassigned SDRAM & DDR CLCD Display CLCD (PL110) ARM1136JF core } 8 AHBs Bus Matrix config 1. ARM Periph AHB 2. ARM D Write AHB 3. ARM D Read AHB 4. ARM I AHB 5. ARM DMA AHB 6. CLCD AHB 7. DMA 2 AHB 8. DMA 1 AHB AHB/APB Bridge GPIO (PL061) 32 GPIO Lines AHB/APB Bridge SSP (PL022) UART (PL011) 2x UARTs SCI (PL131) Smart Card (UICC compliant) CP15 On – chip coprocessor for MMU, cache, protection unit control. Control takes place through registers with instructions executed in supervisor mode. ARM System - On - Chip Architecture 77 Protection Unit Simpler alternative to the MMU. Requires simpler software and hardware. Does not use translation tables, but 8 protection regions instead. ARM System - On - Chip Architecture 78 ARM DEVELOPER SUITE ARMULATOR (1/2) Armulator: Emulator of various ARM processors. Allows project development in C, C++ or Assembly. It includes debugger, compilers, assembler and this entire set is called ARM Developer Suite (ADS). ARM System - On - Chip Architecture 80 ARMULATOR (2/2) Possible project options: ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers MM ARM System - On - Chip Architecture 81 ARMULATOR TUTORIAL CODEWARRIOR ENVIRONMENT ARM System - On - Chip Architecture 82 ARM System - On - Chip Architecture 83 ARM System - On - Chip Architecture 84 ARM System - On - Chip Architecture 85 ARM System - On - Chip Architecture 86 ARM System - On - Chip Architecture 87