brescia@PARK-STREET.BBN.COM (Mike Brescia) (12/21/88)
People, Various different sounding routing complaints have been coming in via the egp-people mailing list, the tcp-ip mailing list, the gated-people mailing list, private mail, and telephone. Some messages are extracted at the end. Problems reported: 1. My host X cannot get to host Y. 2. My gateway X has no route for net Y. 3. My gateway X cannot run its egp with core server S (or T, or U). 4. My gateway X runs egp, and gets no routing info (NR messages) from core. 5. My gateway X runs egp, and gets partially garbled routes from core. Some explanations: 1. is the simplest if the person at host X can report the results of "netstat -r" and point out the default or other gateways used to get to the net where host Y sits; conversely, I hope that X could call Y and ask the same sort of questions for the return path. If X and Y cannot communicate, then we often need to figure out whether the problem is on the path from X to Y or the reverse path back from Y to X. Given the hosts are O.K., the question recurses to one of the gateways involved in problems 2-5. 2. Have to break this down, see by running some EGP trace logs on your gateway X whether it is problem 3, 4, or 5. 3. Growth! Some of the core gateways were oversubscribed, and the total neighbor spaces (peer slots?) available, especially on milnet, was too much. The fix here was to, yet again, squeeze more net and neighbor space in to the LSI11 core gateways. Steve Atlas has been working hard to maintain these bears. The 3 egp core servers on the Arpanet, and 2 out of the 3 on the Milnet, have been upgraded today. 4. Growth! Some versions of EGP have suffered from the growing number of nets reported in the NR messages, when the reassembled packet size crept over 2K bytes. Some were able to recompile with the EGPPACKETSIZE constant set larger, like 4K. Some noted that not all the modules needed were recompiled by the normal 'make' rules, and recommended recompiling the whole EGPUP program. Some sites run 4.3bsd unix, and were able to incorporate fixes mentioned on the egp-people list advising how to use the 'setsockopt' system call to assign more buffering to the egp connection, so that fragments of large packets could be reassembled and delivered to the egp process. Some sites run the 'gated' version of EGP, and get some great support from the people at Cornell (gated-people-request@devvax.tn.cornell.edu). 5. Growth! and a bug in the LSI11 egp code. Bug was introduced in the version that began handling more than 256 nets. Caused the info in the NR message to be sent with the distances no longer in ascending order. Caused there to be more than 255 distances reported for a single neighbor, trying to stuff that number (e.g. 264) into an 8 bit field. This afternoon, Tuesday 12/20, the fix for this has been put in the 5 egp servers that have been reloaded so far (mentioned in 3 above). Keep those packet dumps coming. Mike Brescia BBNCC Gateway Development Group 800-492-4992 (or 617-873-3662) ------------------- some forwarded messages, excerpted --------------- Date: Sat, 17 Dec 88 1:04:11 EST From: Tim Smith (USNA|tcs) <tsmith@BRL.MIL> To: control@bbn.com, tcp-ip@sri-nic.ARPA, gated-people@devvax.TN.CORNELL.EDU Subject: core routing capacity exceeded? Message-Id: <8812170104.aa18644@SEM.BRL.MIL> Morning all, I have been experiencing a bit of trouble acquiring routing information from the core gateways over the last few days. We use gated (version 1.3.1.36) to speak EGP to the core gateways and have noticed that gated has not been providing nearly as good routing information as it usually does- we have been losing contact with the core gateways and gated has been mysteriously dying. I turned on tracing and came across the following: [...] EGP RECV 26.1.0.65 -> 26.7.0.102 Sat Dec 17 00:10:21 1988 vers 2, type ACQUIRE(3), code REFUSE(2), status INSUFFICIENT RESOURCES(3), AS# 1, id 1 EGP RECV 26.1.0.40 -> 26.7.0.102 Sat Dec 17 00:10:21 1988 vers 2, type ACQUIRE(3), code REFUSE(2), status INSUFFICIENT RESOURCES(3), AS# 2049, id 1 EGP RECV 26.3.0.75 -> 26.7.0.102 Sat Dec 17 00:10:21 1988 vers 2, type ACQUIRE(3), code REFUSE(2), status INSUFFICIENT RESOURCES(3), AS# 1, id 1 Is it possible that the routing tables have grown too large and exceeded the core's capacity? What other reasons are there for the insufficient resources message? [...] What does everyone else think? Tim Smith -[hp]ostmaster and general network person ------------------- some forwarded messages, excerpted --------------- Return-Path: <cal@okc-unix.ARPA> Message-Id: <8812192039.AA15146@okc-unix.ARPA> Date: Mon Dec 19 14:39:15 1988 From: cal@okc-unix.ARPA (Charles Leach) Subject: EGP Sick? To: egp-people@bbn.com For the past week or see, EGP has been very intermittent in acquiring routes. Is there any reason for this behavior or is it virus/worm fallout that we can come to expect? charles.. ------------------- some forwarded messages, excerpted --------------- To: cal@okc-unix.ARPA (Charles Leach) cc: egp-people Subject: Re: EGP Sick? In-reply-to: Your message of Mon, 19 Dec 00 19:88:15 +0000. <8812192039.AA15146@okc-unix.ARPA> Date: Mon, 19 Dec 88 16:50:56 -0500 From: Mike Brescia <brescia@park-street> For the past week or see, EGP has been very intermittent in acquiring routes. Is there any reason for this behavior ... Two factors here. 1. the size of the Net Reachability message sent by the core is growing. If your host kernel cannot reassemble and deliver, or your EGP cannot receive packets much larger than 2K (recommend 4K), you will probably see EGP apparently stop receiving any net reachability information at all. The Acquire cycle works, the Hello cycle works, but when you send a Poll, you will receive 2 or 3 fragments, totalling more than 2,000 bytes. 2. A bug has just been exhibited by Walter Prue at ISI, where the LSI11 code sending an EGP message sends the NR information with the distances out of order, creating the apparent need for stuffing more than 255 distance reports through a single 'neighbor'. The result is that your EGP will receive some NR message, but only a few nets show up in in your routing table, because the NR message is badly formed. In the first case, your EGP trace will probably show no NR message at all; in the second case, a trace should show some NR message received, but with some error condition. We are dragging out the big guns to fix this second problem ASAP. [BANG..:-] ------------------- some forwarded messages, excerpted --------------- Date: 8 Dec 88 04:11:11 GMT From: haven!aplcen!aplcomm!trn%aplcomm.jhuapl.edu@mimsy.umd.edu (Tony Nardo) Organization: Johns Hopkins University/APL (Baltimore, Md.) Subject: Is someone playing games with the MILNET/ARPANET interface? Message-Id: <2648@aplcomm.jhuapl.edu> Sender: tcp-ip-relay@sri-nic.arpa To: tcp-ip@sri-nic.arpa I am on a MILNET site. I have noticed three times in the past week (and twice in the past two days) that, while I can not reach a site directly, I *can* reach it thru BRL.ARPA. For example, finger @maryland.arpa will come back with a "Network is unreachable" response, but finger @maryland.arpa@brl.arpa gives the desired "finger" output. Likewise, while I can't send mail directly to a site without it languishing in a mail queue (the name server can't connect to resolve the address), I *CAN* send the mail thru BRL.ARPA. This situation did not arise until CNNC's decision to yank the MILNET/ ARPANET link for "technical difficulties". The first two times, the problem eventually "cleared itself". This is the third time the problem has arisen. Is someone still playing games with the MILNET/ARPANET interface? From my rather untutored perspective, it looks as if "routed" is dying or being deliberately killed somewhere. Anyone have any insights? [...] [ check routing ? - m ]