dvadura@watdragon.waterloo.edu (Dennis Vadura) (04/18/89)
It's me again. Now it comes to mind that I've been doing nothing but griping about the Apollo for the last while. I want it known, that I really like the box, and *most* of the software. But when you read what follows you'll understand that I might get a bit disgruntled at times: Configuration: DN3500, 8 MegRam, 350 MegDisk, SR10.1, SYSTYPE=bsd4.3 ENVIRONMENT=bsd, login shell /bin/csh Introduction: 80 Megs or so of free disk, late sunday afternoon, compiling GCC, dmake (my version of make), 6 pads open, 2 rlogins to local machines, everybody is happy. Suddenly, GCC compile stops, dmake compile finishes. I clean up files, and free an additional 100 megs, by deleting some test files. Node goes catatonic. Can't create more pads, other pads are dead, DM alive but sick. ... tried to figure out problem couldn't, so did 'lo' in cmd, window. Had to 'blow away' the processes. Try to log in, can't. [Couldn't create a DM window] Fine, time for dinner anyway. Go home, log in over net, things look ok, recompile GCC again in background, log off. Morning After: Get to node, still can't log in at console. Go to network, can log in, but CAN'T run *any* commands, none! (see below). [find out later, that make of GCC failed, it could not execute cc!!!, but it got through most of it.] The Fix: Turn off power, seemed to be the only way I could get at the thing. Reboots, console works fine. Check the network access... OH BOY! The Problem: Whenever *anyone* logs into node from the net using rlogin, they *CANNOT* execute *ANY* command, other than builtin shell commands. PATH is correctly set up. echo * in /bin dir shows commands, lets pick on ls as an example. //calypso/bin/ls fails with message "Command not Found". Fine, cd to //calypso/bin, type ./ls, ... same result. The only thing that seems to work is to type: exec /bin/ls args..., which works fine, but ofcourse is good for only a single command per rlogin, somewhat silly :-) What's even more peculiar is that I can't rlogin without being prompted for my password, EVEN THOUGH I DO HAVE A .rhosts entry for the machine I am performing the access from. rsh, and friends also fail with permission denied. NOTE: The only thing I did is to delete a large personal subtree, and reboot the node. All of this stuff worked fine Sunday morning. So, does anyone have any suggestions as to where I might look for a fix to this funny (I'm really laughing) little problem? [BTW: A failure mode where everything goes catatonic when the node runs out of resources seems somewhat ungraceful. This is what I am assuming has caused the current woes.] -thanks -dennis -- -------------------------------------------------------------------------------- 90% of all the scientists that ever |Dennis UUCP,BITNET: dvadura@water lived are alive today! Surprised? |Vadura EDU,CDN,CSNET: dvadura@waterloo ================================================================================
dbfunk@ICAEN.UIOWA.EDU (David B. Funk) (04/18/89)
WRT posting <13353@watdragon.waterloo.edu> It sounds like your "SYSTYPE" environment variable for inetd got messed up. All the "standard" Unix executable directories (/bin /usr/bin /usr/ucb) resolve thru variant links: bin -> $(SYSTYPE)/bin If SYSTYPE were not set to one of "bsd4.3" or "bsd4.2" then none of these directories would be resolvable by the shell, either locally or remotely. To test this out, try invoking something that lives in /usr/apollo/bin, like bldt. That directory doesn't depend upon the SYSTYPE. Try explicitly setting the varaible once you've rloged in. If you could run it, printenv would show you, but it lives in /usr/ucb. From a DM window on that node, you could do a "ps -agew" to look at the environment vars for the processes on that node. Note that you can't read the environment from a remote node; "ps -agewN //name" won't work. Check the files in /sys/node_data/etc on that node. Another possibility, maybe the permissions on those directories got messed up. A directory with "r" but not "x" permissions produces some bizarre results. One last possibility, maybe the directories themselves got messed up. I've not seen it happen under sr10 yet, but I've seen it under sr9.x. When directory gets wounded, things look like they're there but can't be accessed. It usually only happens to a directory that gets a lot of abuse, like /sys/print/queue, or a directory that is being changed when a node crashes. The tool "/com/sald" is used to salvage wounded directories. Dave Funk PS: In general, if you ever have to "blast" processes, such as at logout, its a good idea to reboot the node, there and then if you can. At least, consider the state of the machine as suspect, until you do get it rebooted.