[comp.sys.apollo] why me?

dvadura@watdragon.waterloo.edu (Dennis Vadura) (04/18/89)

It's me again.  Now it comes to mind that I've been doing nothing but
griping about the Apollo for the last while.  I want it known, that I
really like the box, and *most* of the software.  But when you read
what follows you'll understand that I might get a bit disgruntled at
times:

Configuration:  DN3500, 8 MegRam, 350 MegDisk, SR10.1, SYSTYPE=bsd4.3
		ENVIRONMENT=bsd, login shell  /bin/csh

Introduction:  80 Megs or so of free disk, late sunday afternoon, compiling
	       GCC, dmake (my version of make), 6 pads open, 2 rlogins to
	       local machines, everybody is happy.  Suddenly, GCC compile stops,
	       dmake compile finishes.  I clean up files, and free an
	       additional 100 megs, by deleting some test files.  Node goes
	       catatonic.  Can't create more pads, other pads are dead,
	       DM alive but sick.
	       ...
	       tried to figure out problem couldn't, so did 'lo' in cmd,
	       window.  Had to 'blow away' the processes.

	       Try to log in, can't.  [Couldn't create a DM window]
	       Fine, time for dinner anyway.  Go home, log in over net,
	       things look ok, recompile GCC again in background, log off.

Morning After: Get to node, still can't log in at console.  Go to network,
	       can log in, but CAN'T run *any* commands, none! (see below).
	       [find out later, that make of GCC failed, it could not execute
		cc!!!, but it got through most of it.]

The Fix:  Turn off power, seemed to be the only way I could get at the thing.
	  Reboots, console works fine.  Check the network access... OH BOY!

The Problem:  Whenever *anyone* logs into node from the net using rlogin,
	      they *CANNOT* execute *ANY* command, other than builtin shell
	      commands.  PATH is correctly set up.  echo * in /bin dir shows
	      commands, lets pick on ls as an example.  //calypso/bin/ls
	      fails with message "Command not Found".  Fine, cd to
	      //calypso/bin, type ./ls, ... same result.  The only thing
	      that seems to work is to type:  exec /bin/ls args..., which
	      works fine, but ofcourse is good for only a single command
	      per rlogin, somewhat silly :-)

	      What's even more peculiar is that I can't rlogin without being
	      prompted for my password, EVEN THOUGH I DO HAVE A .rhosts entry
	      for the machine I am performing the access from.  rsh, and
	      friends also fail with permission denied.

	      NOTE:  The only thing I did is to delete a large personal
	      subtree, and reboot the node.  All of this stuff worked fine
	      Sunday morning.

So, does anyone have any suggestions as to where I might look for a fix to
this funny (I'm really laughing) little problem?

[BTW:  A failure mode where everything goes catatonic when the node runs out
       of resources seems somewhat ungraceful.  This is what I am assuming
       has caused the current woes.]

-thanks
-dennis
-- 
--------------------------------------------------------------------------------
90% of all the scientists that ever    |Dennis  UUCP,BITNET:    dvadura@water
lived are alive today!   Surprised?    |Vadura  EDU,CDN,CSNET:  dvadura@waterloo
================================================================================

dbfunk@ICAEN.UIOWA.EDU (David B. Funk) (04/18/89)

WRT posting <13353@watdragon.waterloo.edu>

It sounds like your "SYSTYPE" environment variable for inetd
got messed up. All the "standard" Unix executable directories
(/bin /usr/bin /usr/ucb) resolve thru variant links:

     bin -> $(SYSTYPE)/bin

If SYSTYPE were not set to one of "bsd4.3" or "bsd4.2" then
none of these directories would be resolvable by the shell,
either locally or remotely. To test this out, try invoking
something that lives in /usr/apollo/bin, like bldt. That directory
doesn't depend upon the SYSTYPE. Try explicitly setting the
varaible once you've rloged in. If you could run it, printenv
would show you, but it lives in /usr/ucb. From a DM window on
that node, you could do a "ps -agew" to look at the environment
vars for the processes on that node. Note that you can't read
the environment from a remote node; "ps -agewN //name" won't work.
Check the files in /sys/node_data/etc on that node.

Another possibility, maybe the permissions on those directories
got messed up. A directory with "r" but not "x" permissions
produces some bizarre results.

One last possibility, maybe the directories themselves got messed
up. I've not seen it happen under sr10 yet, but I've seen it under
sr9.x. When directory gets wounded, things look like they're there
but can't be accessed. It usually only happens to a directory that gets
a lot of abuse, like /sys/print/queue, or a directory that is being
changed when a node crashes. The tool "/com/sald" is used to salvage
wounded directories.

Dave Funk

PS: In general, if you ever have to "blast" processes, such as at logout,
    its a good idea to reboot the node, there and then if you can.
    At least, consider the state of the machine as suspect, until you
    do get it rebooted.