heker@JVNCA.CSC.ORG.UUCP (11/05/87)
We are experiencing a large number of routing changes in the kernel of one of our VAX8600 "gateways". The number of changes has increased dramatically due to some route instabilities (that are not the topic of this message). The question is how the number of changes can affect the performance of our system?. We see about 1000 route changes in the kernel in periods of 10 minutes. This is as you can see *extremely* high. But does this degrade the system performance at all?. I also want to point out that this route changes are then propagated to other systems (VAX750s). And all dance at the same rithm. Any comments about this will be greately appreciated. -- Sergio ----------------------------------------------------------------------------- Sergio Heker tel: (609) 520-2000 Internet: "heker@jvnca.csc.org" Bitnet: "heker@jvnc" JOHN VON NEUMANN NATIONAL SUPERCOMPUTER CENTER, JVNCnet Network Manager -----------------------------------------------------------------------------
Mills@UDEL.EDU (11/05/87)
Sergio, I'm not sure what you mean by "routing changes." There certainly are vast quantities of changes involving relatively small changes in delay and even uncomfortable quantites involving significant (factor of two) changes. Not many of these involve changes in route, however. While the situation is serious and must be fixed, I don't think the routing overhead itself is a significant factor in performance. Hellograms are rate-limited to no more than one every 400 ms in even the worst case. Let's hear it for all those gated's honking strange distances to the fuzzies. Can someone answer the questions I put out about their behavior? Dave
Mills@UDEL.EDU (11/06/87)
Sergio, Ah yes, the infamous 192.31.x nets. These dudes have been bouncing all over t the map for some time now. The distance values for these nets are not provided by the fuzzballs, but by gated at some site or other. WHen they count to infinity they have in fact become unreachable. This is a classic example of what unstable metrics can do to a distributed Bellman-Ford algorithm. I have been working feversihly to harden the algorithm so that even these wild swings won't destabilize the algorithm, but when distances change from one sample to the next by over fifty percent, what can any algorithm do? I repeat my statement made at least a dozen times: where is the source of those violent delay excursions and what gated is generating them? Having said that, note that even these severe transients should not adversely affect the system throughput, at least for the nets not rocking to and fro, since the hello messages are rate-limited. On the other hand, traffic for nets counting to infinity can clearly gobble up dangerous levels of traffic. That's why I have been spending so much time trying to avoid the counting problem. THe only way to do that is to latch sudden increases in delay and prevent further decreases until the hold-down timer expires, which is what the present system does. I have had to experiment somewhat in order to gauge the sensitivity of the latch, which is presently set at a factor of two. The latch regularily snares at least some of the surges, but not all, as you can see from your data. I can't make the latch more sensitive without snaring a lot of benign wobbles, such as occasional retransmissions on UIUC - NCAR lines, for example. Nevertheless, I have tuned the algorithm a lot in the past month and, at least in the testing swamps, it seems to be working well. It has been suggested that JVNC has more trouble than most because that is the only spot running gated on two machines on the same Ether. I thought Maryland was doing that as well. While they seem to be having trouble of their own, destabilized routes do not seem to be a serious problem there. There are two things I would recommend (again): first, identify all those gated configurations where only a single path is available to the networks being squawked and set the squawked delay to zero, just plain zero. Second, where multiple paths to a net exist, pray to the metric-translation god and really, truly and verily conform to the rules I suggested in my earlier memo. In any case, the clock-offset fields associated with each net in the hello message should be set to zero and the date in the header should be marked invalid. This seems like a pretty simple thing to check. Dave
Mills@UDEL.EDU (11/06/87)
Folks, My apologies to the tcp-ip list for my recent reply to Sergio's message, which must have seemed rather esoteric to most of you. I overlooked the "tcp-ip" addressee in the return address list of the message. On the other hand, if someone wants to start that game, I would be happy to play. Dave
fedor@NIC.NYSER.NET (Mark Fedor) (11/06/87)
>Date: Thu, 5 Nov 87 14:22:12 EST >From: Mills@udel.edu >Subject: Re: routing changes > [ DELETED TEXT - MF ] >Let's hear it for all those gated's honking strange distances to the >fuzzies. Can someone answer the questions I put out about their behavior? > >Dave > Dave, I must admit due to some traveling and moving, I have not read my mail too carefully. As soon as I catch up and find the questions you put out, I will be glad to answer them. Or you can send me a summary of your questions and I'll see what I can do..... Mark NYSERNet Inc. (this is the last time I specify this! Y'all should know I work there by now.) :^) P.S. can you elaborate "strange distances"?
Mills@UDEL.EDU (11/06/87)
Mark, Yeah, I know who you work for now, but if I admitted that you might have an excuse to wiggle off the hook. The hook seems to have already impaled me, as you may have noticed. Strange distances mean anything from 100 ms to somewhere in the middle of Channel 4. Sergio's is a typical example. As for rounding up all the messages I sent on the topic, gimme a break. There must be a hundred of them last month alone. From reports by returning scouts to the INENG meeting, the likely cause may be (a) incompatible gated versions and/or configurations, (b) unstable ripspeakers behind gated or (c) metric conversion violations when more than a single access path is available. Dave