brian@sdcsvax.UCSD.EDU (Brian Kantor) (01/13/88)
I can't ping SEISMO.CSS.GOV from our host 'SDCSVAX.UCSD.EDU' at 26.5.0.3. I can ping SEISMO quite nicely from our neighbor on the same IMP (NOSC.MIL at 26.0.0.3). We are both using the same default gateway at 26.2.0.73, which seems to be the way the packets are going. I guess that SEISMO could have different routes back to SDCSVAX than they do back to NOSC, but I don't know WHY they'd have them, since we're both on the same IMP. SEISMO isn't an isolated case - I can't ping any of the following hosts from SDCSVAX, but I can from NOSC: uxe.cso.uiuc.edu 128.174.5.54 mcc.com 10.3.0.62 ius2.cs.cmu.edu 128.2.254.176 cs.utah.edu 10.0.0.4 There are others. And there are hundreds of mail messages waiting in our queue because we can't get to their nameservers to get delivery addresses. (There ARE hosts on network 10 that I can ping, btw, such as UCBVAX.) I called the Milnet NOC, and they seemed to think that it's a core gateway problem and told me to call BBN. BBN assures me that they were already working on the problem - seems that they see a problem with one of the gateways, but I don't know what that problem would be. I'd like to understand what's going on, as well as get it fixed. The only thing I can think of is that NOSC is connected to the IMP with 1822, and we're connected with X.25, but don't quite see how that would make a difference either. NOSC is running EGP for our 128.54 network, so I don't think that's the problem - if indeed that is even relevant, since we're dealing directly with network 26 source addresses in the pings. Oh yeah, we've been having this sort of problem since Friday. I'm not sure it's always the same hosts. I'm stumped. Anybody got any ideas? Brian Kantor UCSD Office of Academic Computing Academic Network Operations Group UCSD B-028, La Jolla, CA 92093 USA brian@sdcsvax.ucsd.edu (619) 534-6865
dag@hub.ucsb.edu (Darren Daggit) (01/14/88)
In article <4481@sdcsvax.UCSD.EDU> brian@sdcsvax.UCSD.EDU (Brian Kantor) writes: >I can't ping SEISMO.CSS.GOV from our host 'SDCSVAX.UCSD.EDU' at >26.5.0.3. I can ping SEISMO quite nicely from our neighbor on the same >IMP (NOSC.MIL at 26.0.0.3). We are both using the same default gateway >at 26.2.0.73, which seems to be the way the packets are going. > >SEISMO isn't an isolated case - I can't ping any of the following hosts >from SDCSVAX, but I can from NOSC: > > uxe.cso.uiuc.edu 128.174.5.54 > mcc.com 10.3.0.62 > ius2.cs.cmu.edu 128.2.254.176 > cs.utah.edu 10.0.0.4 Brian, I can get to all of these hosts, and haven't seen any problems routing to them. We are sending stuff through SDSC to the NSFNET backbone and on from there, so things probably aren't touching the same piece of ethernet cable that you are on, but the systems are reachable. There have been reported problems that affect at least one of these systems (cs.utah.edu). The gateway psc.psc.edu has evidently been in trouble over the past few days, and things haven't been shuttled around the way they should. >There are others. And there are hundreds of mail messages waiting in >our queue because we can't get to their nameservers to get delivery >addresses. (There ARE hosts on network 10 that I can ping, btw, such >as UCBVAX.) I don't have my map of how the UCSD campus is connected to SDSC at hand, but UCBVAX may not be a good example. If you are routing things through the least number of hops then stuff to UCBVAX probably isn't going to your default gateway, the fastest route would be through the SDSC Proteon and out to the BARRNET (There are two R's in BARRNET aren't there?) backbone. Is this right? I have noticed that when the SDSC link into the NSFNET goes down we can still get to the BARRNET, so the SDSC Proteon is routing things correctly in that respect. >I called the Milnet NOC, and they seemed to think that it's a core >gateway problem and told me to call BBN. BBN assures me that they were >already working on the problem - seems that they see a problem with one >of the gateways, but I don't know what that problem would be. That is probably the PSC.PSC.EDU gateway, maybe it is effecting more than one site. I hope this helps, --darren > Brian Kantor > UCSD Office of Academic Computing > Academic Network Operations Group > UCSD B-028, La Jolla, CA 92093 USA > brian@sdcsvax.ucsd.edu (619) 534-6865 ----------------------------------------------------------------------------- Darren Griffiths BITNET - DAG@SBITP Systems Manager ARPA - DAG@NOBBS.UCSB.EDU Physics Computer Services (128.111.8.50) University of California HEPNET - NOBBS::DAG Santa Barbara, CA 93106 13326::DAG (805)961-2602
brian@sdcsvax.UCSD.EDU (Brian Kantor) (01/15/88)
Well, we're a little closer to the solution - at least we know what the real problem is, although not the cause. As I said, we're X.25 connected to our Milnet IMP. That requires that we have an X.25 virtual circuit for each separate host or gateway on the Milnet that we want to talk to. If we are the first person to send a packet to that address, we open the VC in the range from 1 to 32. If the distant host is the first to send a packet to us, the IMP must open the VC in the range 64-33. If a VC is open already (no matter whether the IMP or our interface opened it), it will be used. If a VC is idle for a configurable length of time (currently 60 seconds), the circuit will be closed. It seems that our problem is caused by the IMP not being able to open VCs to us. We CAN open outgoing VCs, but if the first packet from a host or gateway on the Milnet is originated by them instead of us, then the VC connection won't happen and things don't work. Our problem with Seismo and other hosts is that they are on the far side of gateways to the Arpanet, so that packets from them to us must pass through a minimum of TWO gateways - one from them to Arpa, then one of the Arpa-Milnet mailbridges - in order to get to us. That means that since the ICMP echo response (as well as all other traffic from them to us) might well come to our host through a different mailbridge than the one through which we send the packets to them, they might indeed wind up seeing our packets but we won't see theirs, because the X.25 VC from the IMP for that mailbridge couldn't be opened. Clear as mud, eh? We think it started after the IMP was reloaded after a power failure last Saturday. The NOC has somebody looking into it, they tell me. In the meantime, we're working around the problem by sending a single ping packet to each of the seven mailbridges once every 45 seconds. That's sufficient to keep the X.25 virtual circuits active so people can get to us. It doesn't cure the problem, but it's a whole lot more livable this way. Deep gratitude to Mike Brescia of BBN who helped figure out what's going on - or not going on, in this case. Brian Kantor UCSD Office of Academic Computing Academic Network Operations Group UCSD B-028, La Jolla, CA 92093 USA
oconnor@SCCGATE.SCC.COM (Michael J. O'Connor) (01/15/88)
Brian, Your problem is similar to the one we experience with PSN 7.0 software. In our case, the incoming VC would be created but only one packet would be received. If we pinged the origin of the VC, traffic would begin flowing. The problem was attributed to the new PSN code expecting an X.25 RR after the first packet which our X.25 didn't provide. This was a change from PSN 6 behavior. We also set up a periodic pinging of the mailbridges in order to keep mail flowing (the suggestion came from BBN). What is dissimilar is the fact that we are an Arpanet node while you are Milnet. I didn't think PSN 7.0 was to be installed in the Milnet just yet so unless someone slipped up reloading your PSN I don't know how you could have the same problem as us. We are using Sun's X.25 in our gateway and were told that only Sun users experienced the "hung VC" problem with PSN 7.0. I'd be interested to know whether or not you have the resources to determine whether you get no incoming packets on the bad VCs or just one. Mike
pogran@ccq.bbn.COM (Ken Pogran) (01/16/88)
Mike, I can confirm that PSN 7.0 has NOT been installed in the MILNET yet, and there's no chance that that PSN was accidentally reloaded with PSN 7.0 software (hosts on that node wouldn't be able to communicate AT ALL with the rest of the MILNET if that had happened!). I'm not plugged in to the trouble-shooting process on the MILNET, so I can't speculate on what Brian's problem might actually be. I agree with you that the external symptoms do resemble the ARPANET PSN 7 "stuck VC" problem, but I think the explanation has to turn out to be quite different. Regards, Ken P.S.: With regard to the point that "only Sun users experienced the 'hung VC' problem with PSN 7.0", it's fair to say that Sun's was the only X.25 implementation that we FOUND on the ARPANET that encountered that problem; it's possible that other implementors could have made the same design decisions that Sun made and could have had the problem, too. But, as I said above, that's a PSN 7 problem and certainly isn't what Brian's seeing on the MILNET right now.