Download 1-ca-intro

Computer Architecture (“MAMAS”, 234267) Spring 2015 Lecturer: Yoav Etsion Reception: Mon 15:00, Fishbach 306-8 TAs: Tomer Gurevich, Franck Sala, Andrey Zhitnikov Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, Adi Yoaz and Dan Tsafrir 1 Computer Architecture 2015 – Introduction General Info  Grade    25% Exercise (mandatory) ‫תקף‬ 75% Final exam Textbook  “Computer Architecture: A Quantitative Approach” (4th Edition) by: Patterson & Hennessy  Other course information   2 Course web site: http://webcourse.cs.technion.ac.il/234267/Spring2013 Lectures will be upload to the web a day before the class Computer Architecture 2015 – Introduction Assignments  Five mandatory assignments during the semester   NO CHEATING!     Copying any part of the assignments from another student or using a reference from a previous year Letting someone else copy from you Posting code/solutions to a shared forum  We can track the code back to the posters!  3 Suspected parties will be sent to a Technion trial Typical outcome: course disqualified, which means your graduation will be postponed by at least one semester Possible examples of cheating (a few of many):   Only programming assignments Be honest and work by yourselves! Computer Architecture 2015 – Introduction The Computer System 4 Computer Architecture 2015 – Introduction Classical System Diagram Cache More to the “north” = closer to the CPU = faster CPU CPU BUS North Bridge External Graphics Card DDR2 or DDR3 Channel 1 PCI express 2.0 IOMMU On-board Memory Graphics controller Serial Port Parallel Port IO Controller 5 DDR2 or DDR3 Channel 2 PCI express ×1 South Bridge Floppy Drive Mem BUS keybrd USB controller mouse SATA controller DVD Drive Hard Disk PCI Sound Card speakers Lan Adap LAN Computer Architecture 2015 – Introduction Intel Nehalem Core i3 i5 i7 For high-end i-Series chips, Northbridge functionality moved onto processor (=> made faster) 45 to 32 nm 6 Computer Architecture 2015 – Introduction Intel Sandy Bridge Core i3 i5 i7 32 to 22 nm 7 The trend continues Computer Architecture 2015 – Introduction Computer and System Architecture   The physical design of the computer system is commonly known as Computer Architecture The definition of computer architecture: Prof. Christos Kozyrakis, Stanford Computer theorists invent algorithms that solve important problems and analyze their asymptotic behavior (e.g. O(NlogN) or O(N3)). Computer architects set the constant factors for these algorithms… 8 Computer Architecture 2015 – Introduction Computer and System Architecture   Axiom: The ultimate goal is performance (ops/sec) Constraints:    Practical goal: Highest performance in light of constraints Today, the most common goal is Performance/Watt  9 Power, Number of transistors, Bandwidth, … e.g. ops/Joule Computer Architecture 2015 – Introduction Course Focus  Start from CPU (=processor)      Move on to Memory Hierarchy    Caching and Main memory Virtual Memory Finish with latency tolerance and parallelism  10 Instruction set, performance Pipeline, hazards Branch prediction Out-of-order execution Hardware multithreading Computer Architecture 2015 – Introduction The Processor 11 Computer Architecture 2015 – Introduction Architecture vs. Microarchitecture  Architecture: = The processor features as seen by its user = Interface   Microarchitecture: = Manner by which the processor is implemented = Implementation details   Caches size and structure, number of execution units, … Note: different processors with different u-archs can support the same arch   12 Instruction set, number of registers, addressing modes,… Intel Pentium-IV vs. Intel Core2 Duo ARMv9 implemented by Qualcomm, TI, Samsung Computer Architecture 2015 – Introduction Why Should We Care?  Abstractions enhance productivity, so:    Same goes for arch   Just details for a programmer of a high-level language Abstractions only work so long as what’s below works  13 If we know the arch (=interface), Why should we care about the u-arch (=internals)? The taxi story: http://vimeo.com/11478146 (4:50-6:00) Computer Architecture 2015 – Introduction Recent Processor Trends Source: http://www.scidacreview.org/0904/html/multicore.html 14 Computer Architecture 2015 – Introduction Well-Known Moore’s Law Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm 15 Computer Architecture 2015 – Introduction 16 Computer Architecture 2015 – Introduction The Story in a Nutshell Transistors (1000s) clock speed (MHz) power (W) Instructions/cycle (ILP) 17 Computer Architecture 2015 – Introduction Took the Industry by Surprise 18 Computer Architecture 2015 – Introduction Dire Implications: Performance 19 Computer Architecture 2015 – Introduction Dire Implications: Programmers 22 Computer Architecture 2015 – Introduction Supercomputing: “Top 500 list” 23 Computer Architecture 2015 – Introduction Dire Implications: Supercomputing 24 Computer Architecture 2015 – Introduction Processor Performance 25 Computer Architecture 2015 – Introduction Metrics: IC, CPI, IPC  CPUs work according to a clock signal    Instruction Count (IC)   Clock cycle: measured in nanoseconds (10-9 of a second) Clock frequency = 1/|clock cycle|: in GHz (109 cycles/sec) Total number of instructions executed in the program Cycles Per Instruction (CPI)  Average #cycles per Instruction (in a given program) CPI =  26 #cycles required to execute the program IC IPC (= 1/CPI) : Instructions per cycles. Can be > 1; see the “story in a nutshell slide” Computer Architecture 2015 – Introduction Minimizing Execution Time  CPU Time - time required to execute a program CPU Time = IC  CPI  clock cycle  Our goal: minimize CPU Time (any of above components)  Minimize clock cycle: increase GHz (processor design)  Minimize CPI: u-arch (e.g.: more execution units)  Minimize IC: arch + u-arch (e.g.: SSETM) SSE = streaming SIMD extension (Intel) 27 Computer Architecture 2015 – Introduction Alternative Way to Calculate CPI   ICi = #times instruction of type-i is executed in program n IC = #instruction executed in program = IC   ICi i 1   Fi = relative frequency of type-i instruction = ICi/IC CPIi = #cycles to execute type-i instruction   e.g.: CPIadd = 1, CPImul = 3 n #cycles required to execute the program: # cyc   CPI i  ICi i 1  CPI: n # cyc CPI   IC 28  CPI  IC i 1 i IC i n ICi n   CPIi    CPIi  Fi IC i 1 i 1 Computer Architecture 2015 – Introduction Performance Evaluation: How?  No simple answer  Performance depends on    Mathematical analysis    Typically impossible Systems is too complex to model accurately So how do we evaluate systems?  29 Application Input Empirical analysis Computer Architecture 2015 – Introduction Benchmarks  Use benchmarks & measure how long it takes   Use real applications (=> no absolute answers) Preferably standardized benchmarks (+input), e.g.,  SPEC INT: integer apps • Compression, C complier, Perl, text-processing, …     Sometimes you see FLOPS (“peak” or “sustained”)  30 SPEC FP: floating point apps (mostly scientific) TPC benchmarks: measure transaction throughput (DB) SPEC JBB: models wholesale company (Java server, DB) Supercomputers (top500 list), against LINPACK Computer Architecture 2015 – Introduction Evaluating Performance  Use a performance simulator to evaluate the performance of a new feature / algorithm    Models the uarch to a great detail Run 100’s of representative applications Produce the performance s-curve   Sort the applications according to the IPC increase Baseline (0%) is the processor without the new feature 3% Bad S-curve 2% 6% Positive outliers Good S-curve Positive outliers 4% 1% 0% 2% -1% -2% Negative outliers -3% 0% Small negative outliers -2% -4% 31 Computer Architecture 2015 – Introduction Amdahl’s Law   Suppose we accelerate the computation such that  P = portion of computation we make faster  S = speedup experienced by the portion we improved For example  If an improvement can speedup 40% of the computation => P = 0.4   32 If the improvement makes the portion run twice as fast => S = 2 Then overall speedup = 1 (1  P)  P S Computer Architecture 2015 – Introduction Amdahl’s Law - Example  FP operations improved to run 2x faster  S = 2, but…  P = only affects 10% of the program  Speedup: 1 1    1.053 0.1 0.95 (1  P)  P (1  0.1)  S 2  Conclusion  33 1 Better to make common case fast… Computer Architecture 2015 – Introduction Amdahl’s Law – Parallelism  When parallelizing a program  P = proportion of program that can be made parallel  1 - P = inherently serial   N = number of processing elements (say, cores) Speedup: 1 (1  P)  P  N Serial component imposes a hard limit   1 1   lim  N   (1  P) (1  P)  P  N  34 Computer Architecture 2015 – Introduction Instruction Set Design software The ISA is what the user & compiler see instruction set hardware 35 The HW implements the ISA Computer Architecture 2015 – Introduction Considerations in ISA Design  Instruction size  Long instructions take more time to fetch from memory  Longer instructions require a larger memory • Important for small (embedded) devices  Number of instructions (IC)   36 Reduce IC => reduce runtime (at a given CPI & frequency) Virtues of instructions simplicity  Simpler HW allows for: higher frequency & lower power  Optimization can be applied better to simpler code  Cheaper HW Computer Architecture 2015 – Introduction Basing Design Decisions on Workload Immediate argument’s size in bits (histogram) 30% Int. Avg. FP Avg. 20% 10% 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0% Immediate data bits   37 1% of data values > 16-bits Having 16 bits is likely good enough Computer Architecture 2015 – Introduction CISC Processors  CISC - Complex Instruction Set Computer   Example: x86 The idea: a high level machine language • When people programmed in assembly, CISC supposedly easier  Characteristic   Many instruction types, with a many addressing modes Some of the instructions are complex • Execute complex tasks • Require many cycles  ALU operations directly on memory (e.g., arr[j] = arr[i]+n) • Registers not used (and, accordingly, only a few registers exist)  Variable length instructions • common instructions get short codes  save code length 38 Computer Architecture 2015 – Introduction But it Turns Out… Rank instruction % of total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% Simple instructions dominate instruction frequency 39 Computer Architecture 2015 – Introduction CISC Drawbacks  Complex instructions and complex addressing modes  complicates the processor  slows down the simple, common instructions  contradicts Make The Common Case Fast  Compilers don’t use complex instructions / indexing methods  Variable length instructions are real pain in the neck   Difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts An instruction may be longer than a cache line • Or even longer longer than a page (in theory) 40 Computer Architecture 2015 – Introduction RISC Processors  RISC - Reduced Instruction Set Computer   The idea: simple instructions enable fast hardware Characteristic    A small instruction set, with only a few instructions formats A few indexing methods Load/Store machine: operate only on registers • Memory is accessed using Load and Store instructions only • Many orthogonal registers • Three address machine: Add dst, src1, src2   41 Fixed length instructions Examples: ARMTM MIPSTM, SparcTM, AlphaTM, PowerTM Computer Architecture 2015 – Introduction RISC Processors (Cont.)  Simple arch => simple u-arch      Compiler can be smarter    Better pipeline usage Better register allocation Existing RISC processor are not “pure” RISC  42 Smaller => faster Easier to design & validate (=> cheaper to manufacture) Shorten time-to-market More general-purpose registers (=> less memory refs) Various complex operations added along the way Computer Architecture 2015 – Introduction Compilers and ISA  Ease of compilation  Orthogonality: • no special registers • few special cases • all operand modes available with any data type or instruction type  Regularity: • no overloading for the meanings of instruction fields  streamlined • resource needs easily determined  Register assignment is critical too  43 Easier if lots of registers Computer Architecture 2015 – Introduction CISC or RISC?  ARM (RISC) dominates the personal market   x86 (CISC) dominates the PC and server market    Programs are more compact (fewer instructions) Dynamic optimizations Pro RISC:  44 CISC instructions are broken down to RISC micro-ops Pro CISC:   Not necessarily because it is CISC… Intel processors have a RISC arch   Not necessarily because it is RISC… Simpler and faster uarch Computer Architecture 2015 – Introduction ISA Extensions  Extend arch to accelerate exec of specific apps  Example: SSETM – Streaming SIMD Extensions     128-bit packed (vector) / scalar single precision FP (4×32) Introduced on Pentium® III on ’99 8 new 128 bit registers (XMM0 – XMM7) Accelerates graphics, video, scientific calculations, … 128-bits x3 x2 x1 x0 y0 + 45 y3 y2 y1 x3+y3 x2+y2 x1+y1 x0+y0 Computer Architecture 2015 – Introduction BACKUP 46 Computer Architecture 2015 – Introduction Compatibility  Backward compatibility (HW responsibility)  When buying new hardware, it can run existing software: • i5 can run SW written for Core2 Duo, Pentium4, PentiumM, Pentium III, Pentium II, Pentium, 486, 386, 268 BTW:  Forward compatibility (SW responsibility)    Architecture-independent SW   47 For example: MS Word 2003 can open MS Word 2010 doc Commonly supports one or two generations behind Run SW on top of VM that does JIT (just in time compiler): JVM for Java and CLR for .NET Interpreted languages: Perl, Python Computer Architecture 2015 – Introduction

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1-ca-intro