jzinky@BBN.COM ("John A. Zinky") (12/01/89)
I would like to give my opinion on traffic sensitive routing from the experience of running large over-subscribed networks, such as the 1987-88 ARPANET. (More details can be found in SIGCOM '89 and MILCOM '89 articles by Khanna and Zinky) Summary: GIVEN THE GLACIAL PROCUREMENT TIME FOR NEW EQUIPMENT, TRAFFIC SENSITIVE ROUTING IS MANDATORY FOR LARGE PACKET-SWITCHED NETWORKS. Overall, traffic sensitive SPF routing is NOT HARD to implement once you have the flooding mechanisms to handle failed equipment. Tuning the algorithm is tricky, but it is less effort than fighting the fires associated with over-subscribed lines. In the good old days the ARPANET (pre-1985) had a peak-hour average line utilization of less than 15% and everyone was happy. (Much like the current NFSnet.) But due to the budgeting decision beyond the control of techies, the 1987 ARPANET had an average line utilization greater than 30% with some lines reaching 80%. Also, traffic was growing at a rate of 30% per year and no new bandwidth was scheduled. Its nuts to run a packet network above 15%, unless you have heavy controls in terms of dynamic routing and congestion control. We spent a lot of effort coming up with reasonable schemes which are now deployed in several large networks. If you look at Capacity Management in terms of time-scale and function, you can see that network design, routing, and congestion control complement each other. Network design plans ahead and matches expected traffic to available resources. It works on the time scale of equipment procurement (months to years). Routing is an allocation policy that maps traffic flows onto available resources. It works on the time scale of network operations (minutes to days). Congestion control regulates user traffic so that resources are not oversubscribed. It works on the time scale of a few round trip times. Routing's goal is to make the best allocation of bandwidth given the current user traffic. When the network is under-subscribed, minhop routing is fine and all you have to worry about is equipment failure. For over-subscribed networks, it is tempting to use routing to do some load balancing. The interpretation of the old delay-based SPF routing given in the above papers shows that: 1) Traffic sensitive routing is possible in a network of 220+ nodes with peak-hour average line utilizations in the 15%-40% range. 2) Due to shifting traffic patterns and long lead-time for procuring bandwidth, there WILL BE lines that are > 80% utilized during the entire peak hour. Traffic sensitive routing can shed some of this load to other trunks. (The traffic flows that have only slightly longer alternate paths will be shed first from the line. While the traffic with much longer alternate paths will remain on the line.) The specific algorithm used in the ARPANET/MILNET can handle individual over-subscribed lines with 100%-300% offered minhop load. (The extra traffic is moved to under-subscribed lines). 3) Routing discussions in the time frame of 10 seconds is good enough for load balancing aggregate traffic loads, but congestion control is still needed to handle individual fluctuations. RECOMMENDATIONS 1) If you can afford to run you network with peak resource utilizations of less than 15%, then forget about traffic sensitive routing, minhop is good enough. 2) If you plan to use any form of SPF routing on an over-subscribed network: A) Allow for a "link" weight that has a granularity of at least a 1/10 of a hop. In the ARPANET, 25% of the traffic on a link has an alternate path with the same number of hops. For example suppose that: all the links report a weight of 1 and one link reports 1+epsilon, then 25% of the link's traffic will be shed. It is important to spread out this abrupt change in traffic allocation by having a finer granularity in link metric and making sure that links are not all reporting exactly the same value. B) The range of the link weight should not go above 4 hops. For the ARPANET topology and traffic pattern, if a line reports a weight of more than 4 hops, most/all of the traffic will be have an alternate path that will avoid the line, i.e. no traffic will flow over the link! C) If you cannot change the link weight automatically, at least let it be a configurable parameter. But be prepared, it will be a full time job tuning these link weights. D) With a some effort, you can use the link weights to handle heterogeneous link characteristics. zinky "Be careful, it's a real world out there"
tds@cbnewsh.ATT.COM (antonio.desimone) (12/01/89)
From article <8911302038.AA19457@ucbvax.Berkeley.EDU>, by jzinky@BBN.COM ("John A. Zinky"): > I would like to give my opinion on traffic sensitive routing from the > experience of running large over-subscribed networks, such as the 1987-88 > ARPANET. Or the phone network? There is in fact a lot of experience in running large networks and using traffic-sensitive routing. When I started this thread (I think we're on the same thread) I was really interested in how dynamic routing algorithms in datagram networks compare to the algorithms used in circuit-switched networks, to see if my intuition is completely useless for datagram networks. An apparent difference is that routing in datagram networks can (in principle) react to congestion on the timescales on which queues build up. > equipment procurement (months to years). Routing is an allocation > policy that maps traffic flows onto available resources. It works on > the time scale of network operations (minutes to days). Congestion > control regulates user traffic so that resources are not > oversubscribed. It works on the time scale of a few round trip times. This discussion gives me a nice warm feeling since it says that routing in datagram networks, in practice, behaves much like routing in circuit-switched networks in the sense of mapping traffic flows, *not* individual packets. > "Be careful, it's a real world out there" I love this guy! -- Tony DeSimone AT&T Bell Laboratories Holmdel, NJ 07733 att!tds386e!tds
brnstnd@stealth.acf.nyu.edu (Dan Bernstein) (12/03/89)
In article <6203@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes: > An apparent difference > is that routing in datagram networks can (in principle) react to > congestion on the timescales on which queues build up. Since when? I don't know of any methods of controlling flapping and instability in principle. When the Internet is highly loaded it shows that dynamic routing fails in practice as well. There's absolutely no reason not to use a Bayesian or maximum-entropy calculation of the minimal-cost distribution of paths for a given distribution of load. Maximum-entropy routing yields the efficiency of dynamic routing without the instability of dynamic routing. If you're paranoid, recalculate paths every week instead of every month. ---Dan
tds@cbnewsh.ATT.COM (antonio.desimone) (12/05/89)
From article <4118@sbcs.sunysb.edu>, by brnstnd@stealth.acf.nyu.edu (Dan Bernstein): > In article <6203@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes: >> An apparent difference >> is that routing in datagram networks can (in principle) react to >> congestion on the timescales on which queues build up. > > Since when? I don't know of any methods of controlling flapping and > instability in principle. When the Internet is highly loaded it shows > that dynamic routing fails in practice as well. I'm with you Dan! You're making exactly the same point I tried to make later in my posting (did you get there? :-). > There's absolutely no reason not to use a Bayesian or maximum-entropy ^^^^^^^^^ including being able to calculate routes in a human lifetime? > calculation of the minimal-cost distribution of paths for a given > distribution of load. Maximum-entropy routing yields the efficiency of > dynamic routing without the instability of dynamic routing. If you're > paranoid, recalculate paths every week instead of every month. > > ---Dan This is a little too heavy on the jargon for me. Are you suggesting a single optimal route? Or some kind of table-driven routing with tables calculated in advance? That's a fine idea (in principle:-) and has been used in the phone system (*that* again) quite successfully. It is by no means obvious or trivial to build routing tables, however! And if "Maximum-entropy routing" means some kind of global optimization over the space of possible tables, the method does not scale to a large network. If it means something more specific, maybe you could send me (or post) a reference? -- Tony DeSimone AT&T Bell Laboratories Holmdel, NJ 07733 att!tds386e!tds
brnstnd@stealth.acf.nyu.edu (Dan Bernstein) (12/06/89)
In article <6251@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes: > From article <4118@sbcs.sunysb.edu>, by brnstnd@stealth.acf.nyu.edu (Dan Bernstein): > > There's absolutely no reason not to use a Bayesian or maximum-entropy > ^^^^^^^^^ including being able to calculate routes > in a human lifetime? Yup. > > calculation of the minimal-cost distribution of paths for a given > > distribution of load. Maximum-entropy routing yields the efficiency of > > dynamic routing without the instability of dynamic routing. If you're > > paranoid, recalculate paths every week instead of every month. > This is a little too heavy on the jargon for me. Are you suggesting > a single optimal route? Or some kind of table-driven routing with > tables calculated in advance? You want as little calculation time as possible? Okay: See where the heaviest amount of your traffic is going. Figure out two routes for it. Figure out the delay time on each route; use maximum entropy to find the proper distribution between routes. Remove that destination from your calculation---and make sure to figure a higher delay on the two calculated routes. Repeat until you've built a table covering 99% of your destinations. All of this can be done with a minimal amount of effort every once in a while. (None of this was my idea.) The point is not to do a gigantic calculation of the best routes; the point is to make routing a reasonably stable decision of ``90% of our traffic here, 10% of our traffic there, all the time,'' rather than an instable flip-flop between ``128.122 loves me... 128.122 loves me not... 128.122 just crashed... 128.122 loves me...'' Maximum entropy is just a statistically sound way of keeping any desired variable---say, delay times, where the current Internet fails so miserably---completely under control. ---Dan
RLR@JHUVMS.BITNET (RLR) (12/08/89)
Whats's a reference for the maximum entropy routing algorithm? Ron Ray