* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentation3
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri What is Routing and forwarding? R3 A R1 R4 B C D E R2 R5 F 2 Introduction History … 3 Introduction History … And future trends! 4 What a Router Looks Like Cisco GSR 12416 Juniper M160 19” 19” Capacity: 80Gb/s Power: 2.6kW Capacity: 160Gb/s Power: 4.2kW 6ft 3ft 2ft 2.5ft 5 Packet Processing Functions  Basic network system functionality  Address lookup  Packet forwarding and routing  Fragmentation and re-assembly  Security  Queuing  Scheduling  Packet classification  Traffic measurement  … 6 Per-packet Processing in a Router 1. Accept packet arriving on an ingress line. 2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s). 3. Manipulate packet header: e.g., decrement TTL, update header checksum. 4. Send packet to outgoing interface(s). 5. Queue until line is free. 6. Transmit packet onto outgoing line. 7 Basic Architecture of a Router Routing - Routing table update (OSPF, RIP, IS-IS) - Admission Control - Congestion Control - Reservation How routing protocols establish routes/etc Control Plane May be Slow “Typically in Software” • Routing • Switching Lookup •Arbitration • Packet •Scheduling Classifier How packets get forwarded Data plane (per-packet processing) Switching Must be fast “Typically in Hardware” 8 Generic Router Architecture Data Hdr Header Processing Lookup IP Address Update Header Header Processing Lookup IP Address Update Header Buffer Manager Data Hdr Buffer Address Table Data Hdr Data Hdr Buffer Memory Address Table Data Hdr Buffer Manager Data MemoryHdr Header Processing Lookup IP Address Address Table Update Header Buffer Manager Buffer Memory 9 Functions in a Packet Switch Ingress linecard Interconnect Buffer Framing Route TTL lookup process ing ing Egress linecard Buffer QoS Framing ing schedul ing Interconnect scheduling Control plane Control path Data path Scheduling path usually multiple usage of memory (DRAM for packet buffer, SRAM for queues and tables) 10 Line Card Picture 11 Major Components of Routers: Interconnect Memory Bus Shared Memory Crossbar Interconnect Input Ports to Output Ports, includes 3 modes  Bus    Shared Memory    All Input ports transfer data through the shared bus. Problem : Often cause in data flow congestion. Input port write data into the share memory. After destination lookup is performed, the output port read data from the memory. Problem : Require fast memory read/write and management technology. Crossbar   N input ports has dedicated data path to N output ports. Result in N*N switching matrix. Problem : Blocking (Input, Output, Head-of-line HOL). Max switch load for random traffic is about 59%. 12 Interconnects: Two basic techniques Input Queueing Output Queueing Usually a non-blocking switch fabric (e.g. crossbar) 13 How an OQ Switch Works Output Queued (OQ) Switch 14 Delay Input Queueing: Head of Line Blocking Load 58.6% 100% 15 Head of Line Blocking 16 17 18 Virtual Output Queues (VoQ)  Virtual Output Queues:  At each input port, there are N queues – each associated with an output port  Only one packet can go from an input port at a time  Only one packet can be received by an output port at a time  It retains the scalability of FIFO input-queued switches  It eliminates the HoL problem with FIFO input Queues 19 Input Queueing: Virtual output queues 20 Delay Input Queueing: Virtual output queues Load 100% 21 The Evolution of Router Architecture First Generation Routers Modern Routers 22 First Generation Routers Shared Backplane CPU Route Table Buffer Memory Line Interface Line Interface Line Interface MAC MAC MAC Bus-based Router Architectures with Single Processor 23 First Generation Routers  Based on software implementations on a single CPU.  Limitations:  Serious processing bottleneck in the central processor  Memory intensive operations (e.g. table lookup & data movements) limits the effectiveness of processor power 24 Second Generation Routers CPU Route Table Buffer Memory Line Card Line Card Line Card Buffer Memory Buffer Memory Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC MAC MAC Bus-based Router Architectures with Multiple Processors 25 Second Generation Routers  Architectures with Route Caching  Distribute packet forwarding operations  Network interface cards  Processors  Route caches  Packets are transmitted once over the shared bus  Limitations:  The central routing table is a bottleneck at high-speeds  Traffic dependent throughput (cache)  Shared bus is still a bottleneck 26 Third Generation Routers Switched Backplane Line Card CPU Card Line Card Local Buffer Memory Routing Table Local Buffer Memory Fwding Table Fwding Table MAC MAC Switch-based Architectures with Fully Distributed Processors 27 Third Generation Routers  To avoid bottlenecks:  Processing power  Memory bandwidth  Internal bus bandwidth  Each network interface is equipped with appropriate processing power and buffer space.  Data vs. control plane • Data plane – line cards • Control plane - processor 28 Fourth Generation Routers/Switches Optics inside a router for the first time Optical links 100s of metres Switch Core Linecards 0.3 - 10Tb/s routers in development 29 Demand for More Powerful Routers Do we still higher processing power in networking devices? Of course, YES But why? and how? 30 Demands for Faster Routers (why?) Beyond the moore’s law 107 x link bandwidth 2 x / year 106 x packet inter-arrival time (for 40Gbps): Big packet: 300 ns Small packet: 12 ns 5 Growth 10 x 104 x 103 x 102 x CPU 2 x / two years 10 x 1x Mem improvement in latency 10% / year 1975 1980 1985 1990 1995 2000 Hundreds of instructions per packet Layer 2 IPv4 switching routing Thousands of instructions per packet Flow Intrusion Encryption Classification detection Processing Complexity 2005 31 Demands for Faster Routers (why?)  Future applications will demand TIPS 32 Demands for Faster Routers (why?)  Future applications will demand TIPS  Power? Heat? 33 Demands for Faster Routers (summary) Technology push: - Link bandwidth scaling much faster than CPU and memory technology - Transistor scaling and VLSI technology help but not enough Application pull: - More complex applications are required - Processing complexity is defined as the number of instructions and number of memory access to process one packet 34 Demands for faster routers (How?) “Future applications will demand TIPS” “Think platform beyond a single processor” “Exploit concurrency at multiple levels” “Power will be the limiter due to complexity and leakage” Distribute workload on multiple cores 35 Multi-Core Processors  Symmetric multi-processors allow multi-threaded applications to achieve higher performance at less die area and power consumption than single-core processors  Asymmetric multi-processors consume power and provide increased computational power only on demand 36 Performance Bottlenecks Memory Bandwidth available, but access time too slow Increasing delay for off-chip memory I/O High-speed interfaces available Cost problem with optical interfaces Internal Bus Can be solved with an effective switch, allowing simultaneous transfers between network interfaces Processing power Individual cores are getting more complex Problems with access to shared resources Control processor can become bottleneck 37 Different Solutions Flexibility GPP • ASIC • FPGA • NP • GPP NP FPGA ASIC Performance 38 Different Solutions By: Niraj Shah 39 “It is always something (corollary). Good, Fast, Cheap: Pick any two (you can’t have all three).” RFC1925 “The Twelve Networking Truths” 40 Why not ASIC?  High cost to develop  Network processing moderate quantity market  Long time to market  Network processing quickly changing services  Difficult to simulate  Complex protocol  Expensive and time-consuming to change  Little reuse across products  Limited reuse across versions  No consensus on framework or supporting chips  Requires expertise 41 Network Processors • Introduced several years ago (1999+) • A way to introduce flexibility and programmability in network processing • Many players were there (Intel, Motorola, IBM) • Only a few players still there 42 Intel IXP 2800 Initial release August 2003 43 What Was Correct With NPs?  CPU-level flexibility – A giant step forward compared to ASICs  How? – Hardware coprocessors – Memory hierarchies – Multiple hardware threads (zero context switching overhead) – Narrow (and multiple) memory buses – Some other ad-hoc solutions for network processing, e.g., Fast switching fabric, memory accesses, etc 44 What Was Wrong With NPs? Programmability issues – Completely new programming paradigm – Developers are not familiar with the unprecedented parallelism of the NPU, They do not know how to exploit it at best – New (proprietary) languages – Portability among different network processors families 45 What Happened in NP Market?  Intel went out of the market in 2007  Many other small players disappeared  High risk when selecting a NP maker that may disappear 46 Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works. RFC1925 “The Twelve Networking Truths” 47 Software Routers  Processing in General-purpose CPUs  CPUs optimized for few threads, high performance per thread – High CPU frequencies – Maximize instruction-level parallelism • Pipeline • Superscalar • Out-of-order execution • Branch prediction • Speculative loads 48 Software Routers  Aim: Low cost, flexibility and extensibility  Linux on PC with a bunch of NICs  Changing a functionality is as simple as a software upgrade 49 Software Routers (examples) • RouteBricks [SOSP’09] Uses Intel Nehalem architecture • Packet shader [SIGCOMM’10] GPU-Accelerated Developed in KAIST, Korea 50 Intel Nehalem Architecture C 0 C 1 C 2 C 3 L3 Common Cache 51 Intel Nehalem Architecture  NUMA architecture: The latency to access the local memory is, approximately, 65 nano-seconds. The latency to access the remote memory is, approximately, 105 nano-seconds  Three DDR3 channels to local DRAM support a bandwidth of 31.992GB/s  Bandwidth through of the QPI link is 12.8 GB/s 52 Intel Nehalem Architecture Nehalem Quadcore application Core Core Core Core 0 1 2 3 L1-I L1-D L1-I L1-D L1-I L1-D L1-I L1-D L2 cache L2 cache L2 cache L2 cache file system communication system disk network card Shared L3 Cache Power and clock QPI 1 QPI 2 IMC 3 channels DRAM file system DRAM communication system DRAM application QPI I/O controller hub PCI bus PCI slots PCI slots PCI slots network card disk 53 Other Possible Platforms Intel Westmere-EP Intel Jasper Forest 54 Workload Partitioning (parallelization) Parallel Pipeline Hybrid 55 Questions!
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            