jec@iuvax.cs.indiana.edu (11/11/87)
We are having terrible problems with rlogin/rsh/rcp on our Apollos to and from our EtherNET UNIX environment. The usual problem is that you cannot make a new connection (either we get connection timed-out or a network unreachable). I've heard that there is a way to do static routing using the route command and I've added the route information manually in startup.spm on our DSP-90 server. Unfortunately, this makes the server work, but now all of the other nodes on the ring cannot reach the EtherNET. We are running on the server and other ring nodes: AEGIS 9.6, DOMAIN/IX 9.5, TCP/IP 3.0 Our other machines are: VAX 11/780 BSD4.2 VAX 8800 Ultrix 2.0 Alliant FX/8 Concentrix I since our EtherNET has been completely reliable between all of the machines except the Apollos I strongly suspect that Apollo blew it with their TCP/IP package. I've also tried adding the "/usr/bin/sleep 15" command before the /etc/route commands. This seems to work for the DSP-90 gateway, but not any of the other nodes. HELP! III Usenet: iuvax!jec UUU I UUU ARPANet: jec@iuvax.cs.indiana.edu U I U Phone: (812) 335-7729 U I U U.S. Mail: Indiana University U I U Dept. of Computer Science UUUIUUU 021-E Lindley Hall I Bloomington, IN. 47405 III (Home of the Indiana Hoosiers-- 1987 NCAA Basketball Champions)
giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (11/18/87)
In article <5400004@iuvax>, jec@iuvax.cs.indiana.edu writes: > > We are having terrible problems with rlogin/rsh/rcp on our Apollos > to and from our EtherNET UNIX environment. The usual problem is that you > cannot make a new connection (either we get connection timed-out or a network > unreachable). I've heard that there is a way to do static routing using the > route command and I've added the route information manually in startup.spm > on our DSP-90 server. There is indeed a problem with tcp. The /etc/routed dies occationally. If you call 1-800-2AP-OLLO and ask for a patch, you should have it inside of a week (assuming you have a service contract). Other than the routed, the Apollo TCP seems to work pretty good for me. -- --------------------------------- UUCP: {uunet, ihnp4!umn-cs}!hi-csc!giebelhaus ARPA: hi-csc!giebelhaus@umn-cs.arpa Nobody I know admits to sharing my opinions. I don't even have a pet.
te07+@ANDREW.CMU.EDU.UUCP (11/21/87)
Flame on! I reported problems with this to 1-800-2APOLLO more than a month ago(yes we have a service contract). I am kind of ticked that I find out about a patch on this bboard rather than a phone call or better yet the patch arrives without any request at all. They should notify users/owners about bugs of this importance. Basically our Apollos have only been able to talk to a very small subset of the campus which is very limiting. In general, I have not been pleased with the TCP. Every other workstation seems to fit right into the campus network and can talk to everyone. Why can't the same be true of the Apollo? I have had similar problems with sendmail suddenly going crazy. How does this relate to routed? Does routed cause it or is it another bug? We no longer run either daemon because of these problems. Tom Epperly **************************************************************** te07@andrew.cmu.edu (ARPANET) te07%tb.cc.cmu.edu@cmuccvma.bitnet (BITNET) te07@tb.cc.cmu.edu (CCNET) te07%tb.cc.cmu.edu@csnet-relay te07#%andrew.cmu.edu@seismo.UUCP (UUCP) so I have been told ****************************************************************
jec@iuvax.UUCP (11/21/87)
Well, the patch tape from Apollo seems to have solved our problems with the routing problems. I strongly suggest that you all get this patch if you run DOMAIN/IX and TCP/IP. I also found that a number of our nodes had some problems with their software. Here are some of the changed that I made that helped out TCP/IP: 1. Check your /sys/tcp/hostmap/local.txt file(s). Your should be especially careful with GATEWAY nodes. Do this: NET : 98.0.0.0 : APOLLO-RING : NET : 192.12.206.0 : IU-NET : GATEWAY : 98.0.0.1, 192.12.206.194 : DAPHNE.CS.INDIANA.EDU : DN560 : \ DOMAIN : IP/GW, GW/DUMB : HOST : 192.12.206.1, 98.0.0.1 : GATENODE.CS.INDIANA.EDU : VAX780 : \ UNIX : TCP/TELNET, TCP/FTP : and not: NET : 98.0.0.0 : APOLLO-RING : NET : 192.12.206.0 : IU-NET : GATEWAY : 98.0.0.1, 192.12.206.194 : DAPHNE.CS.INDIANA.EDU : DN560 : \ DOMAIN : IP/GW, GW/DUMB : HOST : 192.12.206.1 : GATENODE.CS.INDIANA.EDU : VAX780 : UNIX : \ TCP/TELNET, TCP/FTP : HOST : 98.0.0.1 : GATENODE.CS.INDIANA.EDU : VAX780 : UNIX : \ TCP/TELNET, TCP/FTP : I'm not sure if this caused a lot of problems with us since a lot of other things were messed up, but this works for us. Make sure and run makehost.sh after you make changes. If this file is messed up you usually get "Network Unreachable" types of errors. 2. Make sure that you have a /sys/node_data[.*]/etc.inetd.conf file and that the right fields are uncommented. Turned out we were running a few nodes with either no etc.inetd.conf file or one that had EVERYTHING COMMENTED OUT. You usually get "Connection Refused" types of errors if this is wrong. 3. Make sure that /etc/run_rc and /sys/node_data[.*]/etc.rc are BOTH mode 4755 (set uid to root). The etc.rc is a bit of lore that I got from Apollo school. Not sure if it is positively necessary, but I do it anyway. The idea is that lots of daemon must be run as root. 4. Make sure that you have PTYs in /sys/node_data[.*]/dev. The best thing to do is: % cd /sys/node_data /* or /sys/node_data.NNNN */ % rm *ty* % /etc/crpty 16 This will delete the old crusty PTYs and create fresh new ones. 5. Make sure that if you rlogin or rsh to an Apollo that the dots files (.rhosts, .login, .cshrc in particular) are owned by the correct person. If you don't, you usually end up with either permission denied or an AEGIS shell. 6. Make sure there are shell fields in /etc/passwd. 7. Make sure that the GATEWAY runs /etc/routed via /sys/tcp/tcp_server -r and does not invoke it as /etc/routed anywhere. Make sure that the non-Gateways do NOT invoke /etc/routed anyway/anyhow. [[THIS ONLY APPLIES IF YOU HAVE THE PATCH SINCE /sys/tcp/tcp_server -r IS PART OF THE PATCH]]. 8. If you have the patch, make sure all of your nodes are running with the patch version of tcp_server and that the gateway has the patch version of tcp_server, routed, rwho, and rwhod. Make sure that all gateways are running this software too! On bad gateway can ruin all of your routing tables. If you have questions or I've said something wrong, feel free to correct me. Anyway, hope this helps out if you are having TCP/IP problems. III Usenet: iuvax!jec UUU I UUU ARPANet: jec@iuvax.cs.indiana.edu U I U Phone: (812) 335-7729 U I U U.S. Mail: Indiana University U I U Dept. of Computer Science UUUIUUU 021-E Lindley Hall I Bloomington, IN. 47405 III (Home of the Indiana Hoosiers-- 1987 NCAA Basketball Champions)
paul@FLEETWOOD.CC.UMICH.EDU ('da Kingfish) (11/21/87)
Tom Epperly mentions "sendmail going crazy." In what way? Relative to tcp? I'm always interested in sendmail pathology. (I agree with the complaint about ad hoc manner in which we seem to find out about these patches -- whatever the application.) --paul
krowitz@mit-richter.UUCP (David Krowitz) (11/23/87)
I don't know exactly what 'crazy' means to you, but ... We have a '/com/sh -c /usr/lib/sendmail -bd -q1h' in out nodes' startup files so that each of them can receive mail from off-site hosts. Occassionally, a node will wind up with about 50 or 60 copies of sendmail running on it. This seems to coincide with the TCP_SERVER hanging (ie. if I try to TELNET out from the node or to FTP out from it, the process just hangs forever). Whether the problem was caused by TCP hanging, or whether TCP hung because of all of the processes, is unknown. It seems to happen less frequently since we started running TCP 3.0. Another strange thing ... our /sys/node_data/proc_dump files seem to grow without bounds. If I use FMPD to list out the dump file, it appears that there is a process dumping out once every hour. Usually this is listed as /com/sh, but sometimes it is spmlogin, or 'object not found'. The only thing that I can think of which occurs once an hour on each of our nodes is the '-bd -q1h' switches we give to sendmail. -- David Krowitz mit-erl!mit-kermit!krowitz@eddie.mit.edu mit-erl!mit-kermit!krowitz@mit-eddie.arpa krowitz@mit-mc.arpa (in order of decreasing preference)
paul@FLEETWOOD.CC.UMICH.EDU ('da Kingfish) (11/24/87)
if you have /com/sh dumping and filling up your proc_dump file, your culprit could be syslog. sendmail is evidently using it when it fires every hour. syslog can fill up your disk in one of two ways: it will go nuts and log its own select I/O errors, or a fork will fail, and that will show up as a /com/sh something about process creation/pgm_$invoke or something like that in proc_dump. To see if this is it, write a small program that logs something over and over, start it (make sure the syslog daemon is going) and then kill tcp_server. (killing tcp_server, ah, simulates whatever problem starts this off in the first place.) I am running the 4.3 syslog, which I got going by yanking out the AF_UNIX, /dev/log, /dev/klog, and /dev/console logging, and using a udp socket instead of the unix socket for syslog to use in communicating with syslogd. --paul