[comp.sys.apollo] TCP/IP 3.1 hangs on rsh

jec@iuvax.cs.indiana.edu (07/19/88)

	I've experienced what I think is a bug with TCP/IP 3.1 and would
like to know if anyone has noticed it and every better, if anyone has a fix.

	I have a script that I run that tries to do a rsh on each apollo 
from the VAX (Ultrix 2.2) in order to determine if TCP/IP services are
functioning.  The problem is that sometimes (not always), the rsh will
hang forever.  I do a:

	% rsh io.cs.indiana.edu /bin/echo UP

	and it will sit there for tens of hours.  I've noticed that it only
seems to occur on diskless nodes.  I'm running SR9.7, Domain/IX 9.5, TCP 3.1,
and the nodes boot from  DSP90s with 3MBs, (one of the DSP90's is the gateway,
but the other is a typical node in the network).

Any ideas?

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of Bob Knight and the Indiana Hoosiers)

aad@stpstn.UUCP (Anthony A. Datri) (07/21/88)

I'd be happy if I could get 3.1 tcp to work at all.   I install it,
and get some error about the /lib/streams file being out of date, so
I install it with the /lib/streams off of the 3.1 tcp tape, and get
the same error.-- 
@disclaimer(Any concepts or opinions above are entirely mine, not those of my
	    employer, my GIGI, or my 11/34)
beak is								  beak is not
Anthony A. Datri,SysAdmin,StepstoneCorporation,stpstn!aad

kwongj@caldwr.caldwr.gov (James Kwong) (07/21/88)

In article <1894@stpstn.UUCP>, aad@stpstn.UUCP (Anthony A. Datri) writes:
> 
> I'd be happy if I could get 3.1 tcp to work at all.   I install it,
> and get some error about the /lib/streams file being out of date, so
> I install it with the /lib/streams off of the 3.1 tcp tape, and get
> the same error.-- 

Did you try shutting down the machine after you installed the new
tcp stuff? I had a similar problem when i installed it over a modem.
I couldn't shut it down after the installation and try to restart tcp
my hand with no luck. it keep complaining about the stream being out
of date. after I rebooted the machine everything was fine. Hope this
helps.

JK

-- 
James Kwong  Calif. Depart. of H2O Resources, Sacramento, CA 95802
caldwr!kwongj@ucdavis.edu(Internet) ...!ucbvax!ucdavis!caldwr!kwongj (UUCP)
The opinions expressed above are mine, not those of the State of California or the California Department of Water Resources.

weber_w@apollo.uucp (Walt Weber) (07/22/88)

In article <1894@stpstn.UUCP> aad@stpstn.UUCP (Anthony A. Datri) writes:
>
>I'd be happy if I could get 3.1 tcp to work at all.   I install it,
>and get some error about the /lib/streams file being out of date, so
>I install it with the /lib/streams off of the 3.1 tcp tape, and get
>the same error.-- 

Anthony:

As your message does not give any indication as to HOW the software is failing,
I will answer as though it is a mis-understanding of messages in the install
procedure.  (If this is a bad assumption on my part, please follow up with
what operations are being performed when you get the failures, and some of the
error text from the failure.)

The installation procedures for tcp3.1 include checking the release date
of critical files like /lib/streams, and should give you an advisory message
like "/lib/streams appears to be out of date..." or "/lib/streams appears to
be a newer release and may not need to be updated..." and then asks if you
wish to have the file updated.

If you answer YES to have the file updated, it will update the file, but the
update WILL NOT TAKE EFFECT UNTIL THE NEXT REBOOT.  The release notes and
installation procedures call this out clearly, I believe.  You should, therefore,
replace the file (if it is out of date), shut down & reboot the node, and then
use tcp3.1.

Please keep us posted (no pun intended) about your progress.

...walt...

-- 
Walt Weber               PHONE: (617) 256-6600 x7004
Apollo Computer          GENIE: W.WEBER
Chelmsford, People's Republic of Massachusetts

kts@quintro.UUCP (Kenneth T. Smelcer) (07/28/88)

In article <5400029@iuvax> jec@iuvax.cs.indiana.edu writes:
>
>	I have a script that I run that tries to do a rsh on each apollo 
>from the VAX (Ultrix 2.2) in order to determine if TCP/IP services are
>functioning.  The problem is that sometimes (not always), the rsh will
>hang forever.  I do a:
>
>	% rsh io.cs.indiana.edu /bin/echo UP
>
>	and it will sit there for tens of hours.  I've noticed that it only
>seems to occur on diskless nodes.  I'm running SR9.7, Domain/IX 9.5, TCP 3.1,
>and the nodes boot from  DSP90s with 3MBs, (one of the DSP90's is the gateway,
>but the other is a typical node in the network).

We have had the same problem talking to nodes within our Apollo network.
On our system, (6 DN3000's and a DSP90 server) both rsh and rlogin have
the same problem.  rlogin will try for a while and then return an "error 0"
message and rsh just hangs forever.  If you kill the request (^C) and try 
again, the request always goes through.

I talked to Apollo service when we first saw this problem (when we installed
SR9.7 with TCP3.0), and they said it was a problem with the routing tables.  
After some length of time, the routing table would seem to be out of date, 
and therefore the request would fail.  However, that request would update the 
table, so the next rsh or rlogin would work just fine.

I was told the problem was a known bug with SR9.7 and TCP 3.0 and was 
supposed to be fixed in 3.1.  Well, it doesn't happen as often as it used to,
but it is still a problem.

I would also be interested in any ideas on a work-around or fix for this 
problem.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Ken Smelcer     Quintron Corporation - Quincy, Il.
UUCP:           {elroy,lll-winken,laidbak}!spl1!quintro!kts
 or             uunet!wucs1!wuibc!quintro!kts

jec@iuvax.cs.indiana.edu (08/11/88)

	I've noticed that ping also has some problems:  If you ping an
Apollo the first attempt will usually fail, but after that ping seems
to work.  

	For instance the first time I try it:

[root@io:33]ping charybdis
PING charybdis.cs.indiana.edu: 56 data bytes
Timed out (1 second) waiting for echo reply		<--- fails
64 bytes from 98.0.0.38: icmp_seq=1. time=154. ms	<--- passes
64 bytes from 98.0.0.38: icmp_seq=2. time=26. ms
64 bytes from 98.0.0.38: icmp_seq=3. time=13. ms
64 bytes from 98.0.0.38: icmp_seq=4. time=17. ms
64 bytes from 98.0.0.38: icmp_seq=5. time=13. ms

	The second time, however:

[root@io:36]!ping
ping charybdis 
PING charybdis.cs.indiana.edu: 56 data bytes
64 bytes from 98.0.0.38: icmp_seq=0. time=23. ms	<--- passes
64 bytes from 98.0.0.38: icmp_seq=1. time=13. ms
64 bytes from 98.0.0.38: icmp_seq=2. time=14. ms
64 bytes from 98.0.0.38: icmp_seq=3. time=23. ms
64 bytes from 98.0.0.38: icmp_seq=4. time=14. ms
64 bytes from 98.0.0.38: icmp_seq=5. time=13. ms

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of Bob Knight and the Indiana Hoosiers)

feigin@batcomputer.tn.cornell.edu (Adam Feigin) (08/11/88)

In article <5400031@iuvax> jec@iuvax.cs.indiana.edu writes:
>
>	I've noticed that ping also has some problems:  If you ping an
>Apollo the first attempt will usually fail, but after that ping seems
>to work.  
>
>	For instance the first time I try it:
>
>[root@io:33]ping charybdis
>PING charybdis.cs.indiana.edu: 56 data bytes
>Timed out (1 second) waiting for echo reply		<--- fails
>64 bytes from 98.0.0.38: icmp_seq=1. time=154. ms	<--- passes
>  ....
>	The second time, however:
>
>[root@io:36]!ping
>ping charybdis 
>PING charybdis.cs.indiana.edu: 56 data bytes
>64 bytes from 98.0.0.38: icmp_seq=0. time=23. ms	<--- passes

I dont seem to have this problem. Perhaps you need to set some options on your
tcp_server when you start it up. You probably have the timeout option set too
low.

apollo.lap csh[7]: ping gulag.sovcen 56 10
PING gulag.sovcen.upenn.edu: 56 data bytes
64 bytes from 128.91.17.137: icmp_seq=0. time=674. ms
64 bytes from 128.91.17.137: icmp_seq=1. time=11. ms
64 bytes from 128.91.17.137: icmp_seq=5. time=11. ms
.....


						Adam
------------------------------------------------------------------------------
Internet: feigin@tcgould.tn.cornell.edu		Adam Feigin
Bitnet: feigin@crnlthry				Workstation Consultant
UUCP: {backbones}!cornell!batcomputer!feigin	Cornell National Supercomputer
MaBell: (607) 255-3985				Facility, Visualization Group

		"Sometimes a little brain damage can help"
------------------------------------------------------------------------------

dennis@PEANUTS.NOSC.MIL (Dennis Cottel) (08/12/88)

> From: jec@iuvax.cs.indiana.edu
> 
> If you ping an
> Apollo the first attempt will usually fail, but after that ping seems
> to work.  

I've noticed that here as well.  It doesn't happen every time, and seems
more likely on the older nodes (DN320, DN550), so I attribute it to
timing out before the appropriate part of the TCP server can be swapped
in to answer.

	Dennis Cottel  Naval Ocean Systems Center, San Diego, CA  92152
	(619) 553-1645      dennis@nosc.MIL      sdcsvax!noscvax!dennis

krowitz@RICHTER.MIT.EDU (David Krowitz) (08/12/88)

Hmm, this is odd. I can use the BSD4.3 ping from my Alliant FX/40
to ping several different Apollos (1: the gateway between the
ethernet and the ringnet, 2: a node on the ringnet, 3: another
apollo which is the gateway from an ethernet at the University
of Washington to the ringnet there, 4: a node on the U. of W.
ringnet) with no problems. Is the problem occurring the first
time you ping the Apollo after it has been booted, or the
first time you try after waiting for some amount of time?


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

P.S. Our nodes are at SR9.7 running TCP 3.1