Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Why are we scared of SPF? IGP Scaling and Stability Dave Katz Overview    History Components of IGP Convergence Conclusions History  1990: Stability, Scalability, Speed, Correctness-Choose one    First few years spent just getting implementations to work Naïve implementations had enough trouble accomplishing correctness without being complicated by reality Prototype-quality software shipped; things tended to fall apart in really ugly ways when pushed hard Copyright © 2002, Juniper Networks, Inc. 3 History  1994: Stability, Scalability, Speed, Correctness-Choose two     Convergence speed became marketing bullet, InterOp booth fodder Cute trick for demos, but the world wasn’t clamoring for it Fast convergence == network back up before someone can call the NOC Efforts to speed convergence tended to cause instability Copyright © 2002, Juniper Networks, Inc. 4 History  1995: Stability, Scalability, Speed, Correctness-Choose 2.5      Networks started getting larger; the era of large ISPs began Stability and scalability were really important, lest you end up in the newspaper (“AOL down for 19 hours,” other less famous catastrophes) Simplistic software/hardware architectures were inherently unstable Big guard rails used to stay away from the instability cliff Speed was sacrificed (chunky timers) Copyright © 2002, Juniper Networks, Inc. 5 The Modern Era  Pressure is mounting to get fast again     Real applications exist that could make use of it (VoIP, etc.) Not just a parlor trick any more Perception of IP as being “too slow” used to promote other technologies We know how to do better now Copyright © 2002, Juniper Networks, Inc. 6 Components of IGP Convergence Detection  LSA/LSP Generation  Flooding/Propagation  SPF Calculation  Route Recursion  Route Download  Detection  Hardware detection is vastly preferable   Can be debounced, held down, etc., in or close to hardware to reduce churn GE and 10GE use in POPs makes this difficult (since you need a way to detect a failed path to a neighbor, not just a failed interface) Copyright © 2002, Juniper Networks, Inc. 8 Detection  Software detection (Hellos) ultimately needed      Fast hellos have been destabilizing in the past due to scheduling latencies (relative to adjacency timeouts) Fast hellos are now doable, and are even somewhat scalable (subsecond detection and hundreds of neighbors) Intelligent scheduling and/or distributed processing If Hello load exceeds 100% of capacity (CPU or protocol I/O bandwidth) things will still fail Adjacency maintenance must be immune to heavy CPU load Copyright © 2002, Juniper Networks, Inc. 9 LSA/LSP Generation     When something changes, you have to tell the world Traditionally, generation delayed to collect multiple changes, then hold down to limit network traffic (on order of seconds) More intelligent strategy is to rapidly announce interesting changes, allow several successive changes to be announced quickly before holddown Newer LSPs will tend to overtake old ones during flooding on systems under load, if done intelligently Copyright © 2002, Juniper Networks, Inc. 10 LSA/LSP Generation     ISIS relatively malleable; some time constants specified but none are “truly normative” OSPF requires receivers to drop LSAs updated within five seconds (limiting senders is sufficient) Suggestion--drop receiver behavior completely, use adaptive strategy on transmit Old receivers will drop rapid updates, but retransmission will operate in similar timeframe (or add a knob) Copyright © 2002, Juniper Networks, Inc. 11 Flooding/Propagation  Propagation of received LSA/LSPs delayed    Group LSAs into bigger LSUpd packets in OSPF Throttling transmission bounds neighbor load (no flow control) Propagation delays directly affect convergence   The next guy can’t even think of calculating routes until the LSA/LSP arrives Background noise (refreshes, flaps) add to the problem Copyright © 2002, Juniper Networks, Inc. 12 Flooding/Propagation    Intelligent scheduling gives “interesting” linkstate data flooding priority Adaptive retransmission schemes can help when things get tough Proper scheduling puts noise “in the noise” Copyright © 2002, Juniper Networks, Inc. 13 SPF Calculation  Traditionally viewed with abject terror      Naïve implementations were slow Run-to-completion scheduling led to lost hellos Inefficient implementations caused even more overhead (reinstalling all routes in FIB) Holddowns and scheduling delays added to work around stability problems Delays slow convergence, create routing loops (23 times delay value) Copyright © 2002, Juniper Networks, Inc. 14 SPF Calculation  In a properly engineered system, SPF should not be destabilizing      Do adjacency maintenance in a preemptive fashion Schedule SPF calculations as background (relative to LSA/LSP processing, flooding, etc.) SPF should be able to run back-to-back all day long without threatening stability, and with only marginal impact on overall convergence Incremental SPF helps even more, though gains are not significant compared to other things given current networks Backoff algorithms arguably unnecessary (especially exponential backoff) Copyright © 2002, Juniper Networks, Inc. 15 Route Recursion A change in IGP next hop may cause a next hop change in many thousands of BGP routes  By far the richest target in improving convergence  Traditionally done in software in order to produce a “flat” forwarding table  Indirect lookup in hardware has minimal forwarding time cost (essentially free if forwarding engine has any free cycles) with huge win in convergence time  Copyright © 2002, Juniper Networks, Inc. 16 Route Download     Output of route calculations typically must be downloaded to hardware Download overhead typically rises with the number of forwarding tables Can be very expensive unless recursion is done in hardware Some level of distribution (multiple engines) necessary for scaling; fixing recursion problem and careful engineering minimizes cost Copyright © 2002, Juniper Networks, Inc. 17 Conclusions Conclusions Stability and Scalability have been the primary concerns until recently; this effort was quite successful  Some of the biggest barriers to overall network convergence have been outside of the IGP implementation per se; examine the behavior of the system as a whole (and the network as a whole)  As these barriers fall it becomes more interesting to take more heroic measures to improve IGP performance  Copyright © 2002, Juniper Networks, Inc. 19 Conclusions  2002: Stability, Scalability, Speed, Correctness-Choose 3.5    Careful engineering should be able to provide speed, scalability, and stability The only effect of a heavily loaded system should be a gradual slowing in convergence (not to crash and burn) IGPs are not inherently unstable, at least until it is no longer possible to support all of the adjacencies (and even then it should be possible to gnaw off limbs) Copyright © 2002, Juniper Networks, Inc. 20 Conclusions  Adding knobs is not the answer    Nobody really knows how to set them Most settings are wrong Either make the parameters adaptive, or make them non-critical  Keep adaptivity simple and bounded; behavior is chaotic enough as it is Copyright © 2002, Juniper Networks, Inc. 21 http://www.juniper.net Copyright © 2002, Juniper Networks, Inc. All rights reserved. Juniper Networks is registered in the U.S. Patent and Trademark Office and in other countries as a trademark of Juniper Networks, Inc. G10, Internet Processor, Internet Processor II, JUNOS, JUNOScript, M5, M10, M20, M40, M40e, and M160 are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners. All specifications are subject to change without notice. Juniper Networks assumes no responsibility for any inaccuracies in this presentation. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this information without notice.