Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMP541 Datapaths I Montek Singh Mar 28, 2012 1 Topics  Over next 2 classes: datapaths  How ALUs are designed  How data is stored in a register file  Lab 9: Start building a datapath! 2 What is computer architecture? 3 Architecture (ISA)  Jumping up a few levels of abstraction.  Architecture: the programmer’s view of the computer  Defined by instructions (operations) and operand locations  Microarchitecture: how to implement an architecture in hardware Application Software programs Operating Systems device drivers Architecture instructions registers Microarchitecture datapaths controllers Logic adders memories Digital Circuits AND gates NOT gates Analog Circuits amplifiers filters Devices transistors diodes Physics electrons MIPS Machine Language  Three instruction formats:  R-Type: register operands  I-Type: immediate operand  J-Type: for jumps R-Type instructions  Register-type  3 register operands:  rs, rt: source registers  rd: destination register  Other fields:  op: the operation code or opcode (0 for R-type instructions)  funct: the function – together, op and funct tell the computer which operation to perform  shamt: the shift amount for shift instructions, otherwise it is 0 R-Type op 6 bits rs 5 bits rt rd shamt funct 5 bits 5 bits 5 bits 6 bits R-Type Examples Field Values Assembly Code rs op rt rd shamt funct add $s0, $s1, $s2 0 17 18 16 0 32 sub $t0, $t3, $t5 0 11 13 8 0 34 5 bits 5 bits 5 bits 5 bits 6 bits 6 bits Note the order of registers in the assembly code: add rd, rs, rt Machine Code op rs rt rd shamt funct 000000 10001 10010 10000 00000 100000 (0x02328020) 000000 01011 01101 01000 00000 100010 (0x016D4022) 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-Type instructions  Immediate-type  3 operands:  op: the opcode  rs, rt: register operands  imm: 16-bit two’s complement immediate I-Type op 6 bits rs 5 bits rt imm 5 bits 16 bits I-Type Examples Assembly Code Field Values rs op rt imm addi $s0, $s1, 5 8 17 16 5 addi $t0, $s3, -12 8 19 8 -12 lw $t2, 32($0) 35 0 10 32 sw $s1, 43 9 17 4 4($t1) 6 bits Note the differing order of registers in the assembly and machine codes: 5 bits 5 bits 16 bits Machine Code op rs rt imm 001000 10001 10000 0000 0000 0000 0101 (0x22300005) addi rt, rs, imm 001000 10011 01000 1111 1111 1111 0100 (0x2268FFF4) lw rt, imm(rs) 100011 00000 01010 0000 0000 0010 0000 (0x8C0A0020) sw rt, imm(rs) 101011 01001 10001 0000 0000 0000 0100 (0xAD310004) 6 bits 5 bits 5 bits 16 bits J-Type instructions  Jump-type  26-bit address operand (addr)  Used for jump instructions (j) J-Type op addr 6 bits 26 bits Review: Instruction Formats R-Type op 6 bits rs 5 bits rt rd shamt funct 5 bits 5 bits 5 bits 6 bits I-Type op 6 bits rs 5 bits rt imm 5 bits 16 bits J-Type op addr 6 bits 26 bits Microarchitecture  Microarchitecture: how to implement an architecture in hardware  This is sometimes just called implementation  Processor:  Datapath: functional blocks  Control: control signals Application Software programs Operating Systems device drivers Architecture instructions registers Microarchitecture datapaths controllers Logic adders memories Digital Circuits AND gates NOT gates Analog Circuits amplifiers filters Devices transistors diodes Physics electrons Parts of CPUs  Datapath  The registers and logic to perform operations on them  Control unit  Generates signals to control datapath 13 Memory and I/O  Memories are connected to the data/control in and out lines  Example: register to memory ops  Will discuss I/O arrangements later 14 Basic Datapath  Basic components of the CPU datapath  PC, Instruction Memory, Register File, ALU, Data Memory CLK CLK CLK PC' PC 32 32 32 A RD Instruction Memory 5 32 5 A1 A2 WE3 WE RD1 RD2 32 32 32 5 32 A3 WD3 Register File 32 A RD Data Memory WD 32 C First: A “lightweight” ALU Arithmetic Logic Unit = ALU 16 Lightweight ALU  A lightweight ALU from textbook:  3-bit function select (7 functions) A B N N ALU N Y 3F F2:0 Function 000 A&B 001 A|B 010 A+B 011 not used 100 A & ~B 101 A | ~B 110 A-B 111 SLT Lightweight ALU: Internals  (light-weight version) A B N N N 0 1 F2 N Cout + [N-1] S Zero Extend N N N N 0 1 2 3 2 N Y F1:0 F2:0 Function 000 A&B 001 A|B 010 A+B 011 not used 100 A & ~B 101 A | ~B 110 A-B 111 SLT Set Less Than (SLT) Example  Configure a 32-bit ALU for the A set if less than (SLT) operation. B N  Suppose A = 25 and B = 32. N  A is less than B, so we expect Y to N 0 1 F2 N   Cout  + [N-1] S 1 bit (MSB) Zero Extend N N N  N 0 1 2 3 2 N Y F1:0 be the 32-bit representation of 1 (0x00000001). For SLT, F2:0 = 111. F2 = 1 configures the adder unit as a subtracter. So 25 - 32 = -7. The two’s complement representation of -7 has a 1 in the most significant bit, so S31 = 1. With F1:0 = 11, the final multiplexer selects Y = S31 (zero extended) = 0x00000001. Next: A “full-feature” ALU 20 Arithmetic Logic Unit (ALU)  Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bidirectional Barrel Shifter Add/Sub Boolean Bool 0 1 1 Math 1 Flags N V,C Flag 0 R 0 … Shft Z Flag Sub Bool Shft Math 0 XX 0 1 1 XX 0 1 X X0 1 1 X X1 1 1 X 00 1 0 X 10 1 0 X 11 1 0 X 00 0 0 X 01 0 0 X 10 0 0 X 11 0 0 OP A+B A-B 0 1 B<<A B>>A B>>>A A & B A | B A ^ B A | B Shifting Logic  Shifting is a common operation  applied to groups of bits  used for alignment  used for “short cut” arithmetic operations  X << 1 is often the same as 2*X  X >> 1 can be the same as X/2  For example:  X = 2010 = 000101002  Left Shift:  (X << 1) = 001010002 = 4010  Right Shift:  (X >> 1) = 000010102 = 1010 X7 X6 X5 X4 X3 X2 X1 X0 “0” SHL1  Signed or “Arithmetic” Right Shift:  (-X >>> 1) = (111011002 >>> 1) = 111101102 = -1010 0 1 R7 0 1 R6 0 1 R5 0 1 R4 0 1 R3 0 1 R2 0 1 R1 0 1 R0 Shifting Logic  How do you shift by more than 1 position?  feed other bits into the multiplexer  e.g., left-shift-by-2  multiplexer for Rk receives input from Xk-2  How do you allow the shift amount to be specified dynamically?  need a bigger multiplexer  shift amount is applied as the select input  will design in class and lab 23 Boolean Operations  It will also be useful to perform logical operations on groups of bits. Which ones?  ANDing is useful for “masking” off groups of bits.  ex. 10101110 & 00001111 = 00001110 (mask selects last 4 bits)  ANDing is also useful for “clearing” groups of bits.  ex. 10101110 & 00001111 = 00001110 (0’s clear first 4 bits)  ORing is useful for “setting” groups of bits.  ex. 10101110 | 00001111 = 10101111 (1’s set last 4 bits)  XORing is useful for “complementing” groups of bits.  ex. 10101110 ^ 00001111 = 10100001 (1’s invert last 4 bits)  NORing is useful for.. uhm…  ex. 10101110 # 00001111 = 01010000 (0’s invert, 1’s clear) Boolean Unit  It is simple to build up a Boolean unit using primitive gates and a mux to select the function.  Since there is no interconnection between bits, this unit can be simply replicated at each position.  The cost is about 7 gates per bit. One for each primitive function, and approx 3 for the 4-input mux. Bi Ai This logic block is repeated for each bit (i.e. 32 times) 00 01 10 Bool Qi 11 An ALU at last!  Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bidirectional Barrel Shifter Add/Sub Boolean Bool 0 1 1 Math 1 Flags N V,C Flag 0 R 0 … Shft Z Flag Sub Bool Shft Math 0 XX 0 1 1 XX 0 1 X X0 1 1 X X1 1 1 X 00 1 0 X 10 1 0 X 11 1 0 X 00 0 0 X 01 0 0 X 10 0 0 X 11 0 0 OP A+B A-B 0 1 B<<A B>>A B>>>A A & B A | B A ^ B A | B Which one do we implement?  We will use the full-feature one!  slightly more challenging …  I will help you!  … but a lot more fun to use  supports much more useful set of instructions for your final programming project 27 Processor Architecture Rather, “microarchitecture” or implementation 28 Microarchitectures  Multiple implementations for a single architecture:  Single-cycle  Each instruction executes in a single cycle  Multicycle  Each instruction is broken up into a series of shorter steps  Pipelined  Each instruction is broken up into a series of steps  Multiple instructions execute at once.  Directly impacts performance obtained Processor Performance  Program execution time  Execution Time = (# instructions) (cycles/instruction)(seconds/cycle)  Definitions:  Cycles/instruction = CPI  Seconds/cycle = clock period  1/CPI = Instructions/cycle = IPC  Challenge is to satisfy constraints of:  Cost  Power  Performance MIPS Processor  We will consider a subset of MIPS instructions (in book & lab):  R-type instructions: and, or, add, sub, slt, …  Memory instructions: lw, sw, …  Branch instructions: beq, …  Some immediate instructions too: addi, …  Jumps as well: j, … Next  Next class:  We’ll look at single cycle MIPS  Then the more complex versions  Lab Friday (March 30)  Demo your graphics displays (Lab 8)  Start on Lab 9 (will post on website by Fri)  start building the datapath! – ALU – Registers 32