[comp.sys.apollo] Apollo TCP/IP problems

jec@iuvax.cs.indiana.edu (11/11/87)

	We are having terrible problems with rlogin/rsh/rcp on our Apollos
to and from our EtherNET UNIX environment.  The usual problem is that you
cannot make a new connection (either we get connection timed-out or a network
unreachable).  I've heard that there is a way to do static routing using the
route command and I've added the route information manually in startup.spm
on our DSP-90 server.  Unfortunately, this makes the server work, but now all
of the other nodes on the ring cannot reach the EtherNET.  

	We are running on the server and other ring nodes:

	AEGIS 9.6, DOMAIN/IX 9.5, TCP/IP 3.0

	Our other machines are:

	VAX 11/780 BSD4.2
	VAX 8800 Ultrix 2.0
	Alliant FX/8 Concentrix

	I since our EtherNET has been completely reliable between all of the
machines except the Apollos I strongly suspect that Apollo blew it with their
TCP/IP package.  


	I've also tried adding the "/usr/bin/sleep 15" command before
the /etc/route commands.  This seems to work for the DSP-90 gateway,
but not any of the other nodes.  

	HELP!

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of the Indiana Hoosiers-- 1987 NCAA Basketball Champions)

giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (11/18/87)

In article <5400004@iuvax>, jec@iuvax.cs.indiana.edu writes:
> 
> 	We are having terrible problems with rlogin/rsh/rcp on our Apollos
> to and from our EtherNET UNIX environment.  The usual problem is that you
> cannot make a new connection (either we get connection timed-out or a network
> unreachable).  I've heard that there is a way to do static routing using the
> route command and I've added the route information manually in startup.spm
> on our DSP-90 server.

There is indeed a problem with tcp.  The /etc/routed dies occationally.
If you call 1-800-2AP-OLLO and ask for a patch, you should have it inside
of a week (assuming you have a service contract).  Other than the routed,
the Apollo TCP seems to work pretty good for me.
-- 
---------------------------------
UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus
ARPA: hi-csc!giebelhaus@umn-cs.arpa
Nobody I know admits to sharing my opinions.  I don't even have a pet.

te07+@ANDREW.CMU.EDU.UUCP (11/21/87)

Flame on!

I reported problems with this to 1-800-2APOLLO more than a month ago(yes we
have a service contract).  I am kind of ticked that I find out about a patch
on this bboard rather than a phone call or better yet the patch arrives
without any request at all.

They should notify users/owners about bugs of this importance.  Basically our
Apollos have only been able to talk to a very small subset of the campus
which is very limiting.  In general, I have not been pleased with the TCP.
Every other workstation seems to fit right into the campus network and can
talk to everyone.  Why can't the same be true of the Apollo?

I have had similar problems with sendmail suddenly going crazy.  How does
this relate to routed?  Does routed cause it or is it another bug?  We no
longer run either daemon because of these problems.

Tom Epperly
****************************************************************
te07@andrew.cmu.edu (ARPANET)
te07%tb.cc.cmu.edu@cmuccvma.bitnet (BITNET)
te07@tb.cc.cmu.edu (CCNET)
te07%tb.cc.cmu.edu@csnet-relay
te07#%andrew.cmu.edu@seismo.UUCP (UUCP) so I have been told
****************************************************************

jec@iuvax.UUCP (11/21/87)

	Well, the patch tape from Apollo seems to have solved our problems
with the routing problems.  I strongly suggest that you all get this patch
if you run DOMAIN/IX and TCP/IP.  I also found that a number of our nodes
had some problems with their software.  Here are some of the changed that
I made that helped out TCP/IP:

1.	Check your /sys/tcp/hostmap/local.txt file(s).  Your should be
        especially careful with GATEWAY nodes.  Do this:

	NET : 98.0.0.0 : APOLLO-RING :
	NET : 192.12.206.0 : IU-NET :
	GATEWAY : 98.0.0.1, 192.12.206.194 : DAPHNE.CS.INDIANA.EDU : DN560 : \
	 DOMAIN : IP/GW, GW/DUMB :
	HOST : 192.12.206.1, 98.0.0.1 : GATENODE.CS.INDIANA.EDU : VAX780 : \
	 UNIX : TCP/TELNET, TCP/FTP :

	and not:

	NET : 98.0.0.0 : APOLLO-RING :
	NET : 192.12.206.0 : IU-NET :
	GATEWAY : 98.0.0.1, 192.12.206.194 : DAPHNE.CS.INDIANA.EDU : DN560 : \
	 DOMAIN : IP/GW, GW/DUMB :
	HOST : 192.12.206.1 : GATENODE.CS.INDIANA.EDU : VAX780 : UNIX : \
	 TCP/TELNET, TCP/FTP :
	HOST : 98.0.0.1 : GATENODE.CS.INDIANA.EDU : VAX780 : UNIX : \
	 TCP/TELNET, TCP/FTP :

	I'm not sure if this caused a lot of problems with us since a lot of
	other things were messed up, but this works for us.

	Make sure and run makehost.sh after you make changes.

	If this file is messed up you usually get "Network Unreachable"
	types of errors.

2.	Make sure that you have a /sys/node_data[.*]/etc.inetd.conf file
	and that the right fields are uncommented.  Turned out we were
	running a few nodes with either no etc.inetd.conf file or one 
	that had EVERYTHING COMMENTED OUT.

	You usually get "Connection Refused" types of errors if this is
	wrong.

3.	Make sure that /etc/run_rc and /sys/node_data[.*]/etc.rc are
	BOTH mode 4755 (set uid to root).  The etc.rc is a bit of lore
	that I got from Apollo school.  Not sure if it is positively
	necessary, but I do it anyway.

	The idea is that lots of daemon must be run as root.  

4.	Make sure that you have PTYs in /sys/node_data[.*]/dev.  The
	best thing to do is:
			% cd /sys/node_data	/* or /sys/node_data.NNNN */
			% rm *ty*
			% /etc/crpty 16
	This will delete the old crusty PTYs and create fresh new ones.  

5.	Make sure that if you rlogin or rsh to an Apollo that the dots
	files (.rhosts, .login, .cshrc in particular) are owned by the
	correct person.  If you don't, you usually end up with either
	permission denied or an AEGIS shell.

6.	Make sure there are shell fields in /etc/passwd.

7.	Make sure that the GATEWAY runs /etc/routed via /sys/tcp/tcp_server -r
	and does not invoke it as /etc/routed anywhere.  Make sure that the
	non-Gateways do NOT invoke /etc/routed anyway/anyhow. [[THIS ONLY
	APPLIES IF YOU HAVE THE PATCH SINCE /sys/tcp/tcp_server -r IS PART
	OF THE PATCH]].

8.	If you have the patch, make sure all of your nodes are running with
	the patch version of tcp_server and that the gateway has the patch
	version of tcp_server, routed, rwho, and rwhod.  Make sure that all
	gateways are running this software too!  On bad gateway can ruin all
	of your routing tables.

	If you have questions or I've said something wrong, feel free to
correct me.  Anyway, hope this helps out if you are having TCP/IP problems.

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of the Indiana Hoosiers-- 1987 NCAA Basketball Champions)

paul@FLEETWOOD.CC.UMICH.EDU ('da Kingfish) (11/21/87)

Tom Epperly mentions "sendmail going crazy."

In what way?  Relative to tcp?  I'm always interested in sendmail
pathology.

(I agree with the complaint about ad hoc manner in which we seem to
find out about these patches -- whatever the application.)

--paul

krowitz@mit-richter.UUCP (David Krowitz) (11/23/87)

I don't know exactly what 'crazy' means to you, but ...
We have a '/com/sh -c /usr/lib/sendmail -bd -q1h' in out
nodes' startup files so that each of them can receive mail
from off-site hosts. Occassionally, a node will wind up
with about 50 or 60 copies of sendmail running on it. This
seems to coincide with the TCP_SERVER hanging (ie. if I try
to TELNET out from the node or to FTP out from it, the process
just hangs forever). Whether the problem was caused by TCP
hanging, or whether TCP hung because of all of the processes,
is unknown. It seems to happen less frequently since we
started running TCP 3.0.

Another strange thing ... our /sys/node_data/proc_dump files
seem to grow without bounds. If I use FMPD to list out the
dump file, it appears that there is a process dumping out
once every hour. Usually this is listed as /com/sh, but sometimes
it is spmlogin, or 'object not found'. The only thing that I
can think of which occurs once an hour on each of our nodes
is the '-bd -q1h' switches we give to sendmail.


 -- David Krowitz

mit-erl!mit-kermit!krowitz@eddie.mit.edu
mit-erl!mit-kermit!krowitz@mit-eddie.arpa
krowitz@mit-mc.arpa
(in order of decreasing preference)

paul@FLEETWOOD.CC.UMICH.EDU ('da Kingfish) (11/24/87)

if you have /com/sh dumping and filling up your proc_dump file, your
culprit could be syslog.  sendmail is evidently using it when it fires every
hour.

syslog can fill up your disk in one of two ways:

it will go nuts and log its own select I/O errors, or a fork will fail,
and that will show up as a /com/sh something about process
creation/pgm_$invoke or something like that in proc_dump.

To see if this is it, write a small program that logs something over
and over, start it (make sure the syslog daemon is going) and then kill
tcp_server.  (killing tcp_server, ah, simulates whatever problem starts
this off in the first place.)

I am running the 4.3 syslog, which I got going by yanking out the
AF_UNIX, /dev/log, /dev/klog, and /dev/console logging, and using a udp
socket instead of the unix socket for syslog to use in communicating
with syslogd.

--paul