[comp.protocols.tcp-ip] Traffic Sensitive SPF Routing is NOT too hard!

jzinky@BBN.COM ("John A. Zinky") (12/01/89)

I would like to give my opinion on traffic sensitive routing from the
experience of running large over-subscribed networks, such as the 1987-88
ARPANET. (More details can be found in SIGCOM '89 and MILCOM '89 articles by
Khanna and Zinky)

Summary: 
GIVEN THE GLACIAL PROCUREMENT TIME FOR NEW EQUIPMENT, 
TRAFFIC SENSITIVE ROUTING IS MANDATORY FOR LARGE PACKET-SWITCHED NETWORKS.
Overall, traffic sensitive SPF routing is NOT HARD to implement once you
have the flooding mechanisms to handle failed equipment. Tuning the
algorithm is tricky, but it is less effort than fighting the fires
associated with over-subscribed lines.


In the good old days the ARPANET (pre-1985) had a peak-hour average
line utilization of less than 15% and everyone was happy. (Much like
the current NFSnet.) But due to the budgeting decision beyond the
control of techies, the 1987 ARPANET had an average line utilization
greater than 30% with some lines reaching 80%.  Also, traffic was
growing at a rate of 30% per year and no new bandwidth was scheduled.
Its nuts to run a packet network above 15%, unless you have heavy
controls in terms of dynamic routing and congestion control. We spent
a lot of effort coming up with reasonable schemes which are now
deployed in several large networks.

If you look at Capacity Management in terms of time-scale and
function, you can see that network design, routing, and congestion
control complement each other. Network design plans ahead and matches
expected traffic to available resources. It works on the time scale of
equipment procurement (months to years). Routing is an allocation
policy that maps traffic flows onto available resources. It works on
the time scale of network operations (minutes to days).  Congestion
control regulates user traffic so that resources are not
oversubscribed. It works on the time scale of a few round trip times.

Routing's goal is to make the best allocation of bandwidth given the
current user traffic. When the network is under-subscribed, minhop
routing is fine and all you have to worry about is equipment failure.
For over-subscribed networks, it is tempting to use routing to do some
load balancing.

The interpretation of the old delay-based SPF routing given in the above
papers shows that:

1) Traffic sensitive routing is possible in a network of 220+ nodes with
peak-hour average line utilizations in the 15%-40% range. 

2) Due to shifting traffic patterns and long lead-time for procuring
bandwidth, there WILL BE lines that are > 80% utilized during the entire
peak hour.  Traffic sensitive routing can shed some of this load to other
trunks. (The traffic flows that have only slightly longer alternate paths
will be shed first from the line. While the traffic with much longer alternate
paths will remain on the line.) The specific algorithm used in the
ARPANET/MILNET can handle individual over-subscribed lines with 100%-300%
offered minhop load.  (The extra traffic is moved to under-subscribed
lines).

3) Routing discussions in the time frame of 10 seconds is good enough
for load balancing aggregate traffic loads, but congestion control is still
needed to handle individual fluctuations.


RECOMMENDATIONS

1) If you can afford to run you network with peak resource
utilizations of less than 15%, then forget about traffic sensitive
routing, minhop is good enough.

2) If you plan to use any form of SPF routing on an over-subscribed network:

A) Allow for a "link" weight that has a granularity of at least a 1/10
of a hop. In the ARPANET, 25% of the traffic on a link has an
alternate path with the same number of hops. For example suppose that:
all the links report a weight of 1 and one link reports 1+epsilon,
then 25% of the link's traffic will be shed. It is important to spread
out this abrupt change in traffic allocation by having a finer
granularity in link metric and making sure that links are not all
reporting exactly the same value.

B) The range of the link weight should not go above 4 hops. For the ARPANET
topology and traffic pattern, if a line reports a weight of more than 4
hops, most/all of the traffic will be have an alternate path that will avoid
the line, i.e. no traffic will flow over the link!

C) If you cannot change the link weight automatically, at least let it be a
configurable parameter. But be prepared, it will be a full time job tuning
these link weights.

D) With a some effort, you can use the link weights to handle heterogeneous
link characteristics.



zinky
"Be careful, it's a real world out there"

tds@cbnewsh.ATT.COM (antonio.desimone) (12/01/89)

From article <8911302038.AA19457@ucbvax.Berkeley.EDU>, by jzinky@BBN.COM ("John A. Zinky"):
> I would like to give my opinion on traffic sensitive routing from the
> experience of running large over-subscribed networks, such as the 1987-88
> ARPANET.

Or the phone network?  There is in fact a lot of experience in running
large networks and using traffic-sensitive routing.  When I started
this thread (I think we're on the same thread) I was really interested
in how dynamic routing algorithms in datagram networks compare to the
algorithms used in circuit-switched networks, to see if my intuition
is completely useless for datagram networks.  An apparent difference
is that routing in datagram networks can (in principle) react to
congestion on the timescales on which queues build up.

> equipment procurement (months to years). Routing is an allocation
> policy that maps traffic flows onto available resources. It works on
> the time scale of network operations (minutes to days).  Congestion
> control regulates user traffic so that resources are not
> oversubscribed. It works on the time scale of a few round trip times.

This discussion gives me a nice warm feeling since it says that routing
in datagram networks, in practice, behaves much like routing in
circuit-switched networks in the sense of mapping traffic flows, *not*
individual packets.

> "Be careful, it's a real world out there"

I love this guy!
-- 
Tony DeSimone
AT&T Bell Laboratories
Holmdel, NJ 07733
att!tds386e!tds

brnstnd@stealth.acf.nyu.edu (Dan Bernstein) (12/03/89)

In article <6203@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:
> An apparent difference
> is that routing in datagram networks can (in principle) react to
> congestion on the timescales on which queues build up.

Since when? I don't know of any methods of controlling flapping and
instability in principle. When the Internet is highly loaded it shows
that dynamic routing fails in practice as well.

There's absolutely no reason not to use a Bayesian or maximum-entropy
calculation of the minimal-cost distribution of paths for a given
distribution of load. Maximum-entropy routing yields the efficiency of
dynamic routing without the instability of dynamic routing. If you're
paranoid, recalculate paths every week instead of every month.

---Dan

tds@cbnewsh.ATT.COM (antonio.desimone) (12/05/89)

From article <4118@sbcs.sunysb.edu>, by brnstnd@stealth.acf.nyu.edu (Dan Bernstein):
> In article <6203@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:
>> An apparent difference
>> is that routing in datagram networks can (in principle) react to
>> congestion on the timescales on which queues build up.
> 
> Since when? I don't know of any methods of controlling flapping and
> instability in principle. When the Internet is highly loaded it shows
> that dynamic routing fails in practice as well.

I'm with you Dan!  You're making exactly the same point I tried to
make later in my posting (did you get there? :-).

> There's absolutely no reason not to use a Bayesian or maximum-entropy
                     ^^^^^^^^^ including being able to calculate routes
                               in a human lifetime?
> calculation of the minimal-cost distribution of paths for a given
> distribution of load. Maximum-entropy routing yields the efficiency of
> dynamic routing without the instability of dynamic routing. If you're
> paranoid, recalculate paths every week instead of every month.
> 
> ---Dan
This is a little too heavy on the jargon for me.  Are you suggesting
a single optimal route?  Or some kind of table-driven routing with
tables calculated in advance?  That's a fine idea (in principle:-) and
has been used in the phone system (*that* again) quite successfully.
It is by no means obvious or trivial to build routing tables, however!
And if "Maximum-entropy routing" means some kind of global optimization
over the space of possible tables, the method does not scale to a large
network.  If it means something more specific, maybe you could send me
(or post) a reference? 
-- 
Tony DeSimone
AT&T Bell Laboratories
Holmdel, NJ 07733
att!tds386e!tds

brnstnd@stealth.acf.nyu.edu (Dan Bernstein) (12/06/89)

In article <6251@cbnewsh.ATT.COM> tds@cbnewsh.ATT.COM (antonio.desimone) writes:
> From article <4118@sbcs.sunysb.edu>, by brnstnd@stealth.acf.nyu.edu (Dan Bernstein):
> > There's absolutely no reason not to use a Bayesian or maximum-entropy
>                      ^^^^^^^^^ including being able to calculate routes
>                                in a human lifetime?

Yup.

> > calculation of the minimal-cost distribution of paths for a given
> > distribution of load. Maximum-entropy routing yields the efficiency of
> > dynamic routing without the instability of dynamic routing. If you're
> > paranoid, recalculate paths every week instead of every month.
> This is a little too heavy on the jargon for me.  Are you suggesting
> a single optimal route?  Or some kind of table-driven routing with
> tables calculated in advance?

You want as little calculation time as possible? Okay: See where the
heaviest amount of your traffic is going. Figure out two routes for it.
Figure out the delay time on each route; use maximum entropy to find the
proper distribution between routes. Remove that destination from your
calculation---and make sure to figure a higher delay on the two
calculated routes. Repeat until you've built a table covering 99% of
your destinations. All of this can be done with a minimal amount of
effort every once in a while. (None of this was my idea.)

The point is not to do a gigantic calculation of the best routes; the
point is to make routing a reasonably stable decision of ``90% of our
traffic here, 10% of our traffic there, all the time,'' rather than an
instable flip-flop between ``128.122 loves me... 128.122 loves me not...
128.122 just crashed... 128.122 loves me...'' Maximum entropy is just a
statistically sound way of keeping any desired variable---say, delay
times, where the current Internet fails so miserably---completely under
control.

---Dan

RLR@JHUVMS.BITNET (RLR) (12/08/89)

Whats's a reference for the maximum entropy routing algorithm? Ron Ray