[comp.unix.xenix] init's untimely death.

larry@tapa.uucp (Larry Pajakowski) (06/21/89)

Perhaps someone can shed some light on a perplexing problem we are having.
We have a Compaq 386/20 running Xenix-386 2.3.1 with Excelan TCP/IP V3.5
and Xenix-Net 1.2.

About once a week more or less init dies.  After that of course the machine
slowly grinds to a halt and must be powered off.  I now have a script running
periodically which checks for init and reboots after doing a ps if there is no
init.  Ok that keeps it alive but why?

I've talked both to SCO and Excelan.  Neither has been able to help much.
It may be slightly worse under heavy TCP/IP load but then I've had it happen
on an idle machine.  There have been power line glitch monitors on the power
and we have run diagnostics over a weekend with no indications of any problem.
The only other clue is 2 kernel panics over the last 3 months "Free inode
isnt't".

I would appreciate hearing from anyone with some ideas either by email or
phone.  Many Thanks.

Larry Pajakowski
Abbott Labs.
...!ddsw1!abtcser!larry
1-312-937-1153

edhew@egvideo.UUCP (Ed Hew) (06/24/89)

I had originally replied to this via email, however it occurs to me
that perhaps someone else may have similar problems, or better yet,
have resolved them.

In article <1989Jun21.114506.1378@tapa.uucp> larry@tapa.uucp (Larry Pajakowski) writes:

> Perhaps someone can shed some light on a perplexing problem we are having.
> We have a Compaq 386/20 running Xenix-386 2.3.1 with Excelan TCP/IP V3.5
> and Xenix-Net 1.2.

I was in a similar (lack of) light several months ago, (shortly after our
conversion to 2.3.1).  The major difference was that it wasn't TCP/IP,
it was uucp (kind of) causing me headaches.
> 
> About once a week more or less init dies.  After that of course the machine
> slowly grinds to a halt and must be powered off.  I now have a script running
> periodically which checks for init and reboots after doing a ps if there is no
> init.  Ok that keeps it alive but why?

My scenario was as follows:

I'd leave this system just humming along and go to work.  I'd return
late that night and find my system ground to a halt.  Nobody was cleaning
up defunct processes and my process table was full.  Hence the system was
effectively dead.  init had somehow been assasinated.

Of course, I'd discover this after I logged on with my *non*root account,
so I couldn't even do a proper shutdown.  With no init, I have no getty,
and can get no login.  I'd log on on tty01; the original getty would at least
let me do that and replace itself with my login shell, but, then.... log off
to log on as root, and...  well......  [arghhh!  where's that switch?
  ....sure like fsck, ummhmmmmm].

RTFM says something like:  "shutdown can only be run in the foreground by
root".

After a couple of weeks of fruitless testing and surmization (sp?), I turned
the process accounting on.  Well, let's be honest, I always had the proc
accounting on, I just decided to look at it.  1/2 :-)

Sure enough, init was exiting for some reason right when I had a cron task
disable and enable the tty that had an attached uuxqt happening, processing
news.

Some background is required here.  The disable/enable was a workaround to
a problem whereby DTR wasn't (for some still unresolved reason) being raised
after polling our host for news.  So, we simply cron'd a script to disable/
enable the TBit tty every 15 minutes if nobody was on it at the time.
That solved the (no DTR) problem, but then the above occured.  The disable/
enable was assassinating init.  Process accounting says so.

Now we check to make sure uuxqt isn't running at that time as well.
Haven't had a problem since.

I can also tell you that the above results have been manually recreated
on this site.  Sometimes.  It's not consistent.  Arghhhh!
There is a missing factor here.  I don't know what it is.

> I've talked both to SCO and Excelan.  Neither has been able to help much.
> It may be slightly worse under heavy TCP/IP load but then I've had it happen
> on an idle machine.  There have been power line glitch monitors on the power
> and we have run diagnostics over a weekend with no indications of any problem.
> The only other clue is 2 kernel panics over the last 3 months "Free inode
> isnt't".

In my case:  A thought:  I wonder if this could be related to the old
problem in pre-2.2.x releases where the docs warned us that using a
disable/enable sequence without separating them by at least a 1 minute
interval was asking for trouble.

All I can suggest is that you check out the above info; check out what
your process accounting tells you.  Find out what's happening when init
dies, and prevent it from happening.

If you ever find out *why* this happens, please email me.  Right now I am
still using a workaround.  I'd rather find a *fix*.

> I would appreciate hearing from anyone with some ideas either by email or
> phone.  Many Thanks.

Hope this helps.
> 
> Larry Pajakowski > Abbott Labs.  ...!ddsw1!abtcser!larry 1-312-937-1153

		--ed		{edhew@egvideo.uucp}

  Ed. A. Hew     Authorized SCO Technical Trainer      Xeni/Con Corporation
  work:  edhew@xenicon.uucp	 -or-	 ..!{uunet!}utai!lsuc!xenicon!edhew
  home:	 edhew@egvideo.uucp	 -or-	   ..!{uunet!}watmath!egvideo!edhew
  # I haven't lost my mind, it's backed up on floppy around here somewhere!

wht@tridom.uucp (Warren Tucker) (06/26/89)

In article <2045@egvideo.UUCP>, edhew@egvideo.UUCP (Ed Hew) writes:
> 
> RTFM says something like:  "shutdown can only be run in the foreground by
> root".

haltsys or reboot is SCO's way of telling shutdown where to stuff it!
It might not be nice for servers or off-hokk comm lines, but it WILL
shut the system down RIGHT AWAY.
-- 
-------------------------------------------------------------------
Warren Tucker, Tridom Corporation       ...!gatech!emory!tridom!wht 
Sforzando (It., sfohr-tsahn'-doh).  A direction to perform the tone
or chord with special stress, or marked and sudden emphasis.

garyb@gallium.UUCP (Gary Blumenstein) (07/02/89)

In article <2045@egvideo.UUCP> edhew@egvideo.UUCP (Ed Hew) writes:
>In article <1989Jun21.114506.1378@tapa.uucp> larry@tapa.uucp (Larry Pajakowski) writes:
>> Perhaps someone can shed some light on a perplexing problem we are having.

Me three!  I had a similar experience with init dying.  Listen to this one.

Some months back I hooked up a serial line from one of our ports to a port
on our VAX so I could log into our VMS (ugh!) system.  Well one night I was
showing the Operator in the data center what Usenet was all about and I
said, "wait a minute, I'll give you an account on our XENIX system so you
can log in when you're bored silly at 11pm and read a little comp.os.vms
or whatever".  "Really!?", he said.  "No problem, we have a direct line
set up ready to go.", I said as I dim-wittedly enabled the port on the
XENIX side.

Enter the dreaded battle of the logins.  In this case it was VMS LOGIN versus
XENIX getty.  Invisibly, the poor computers were duking it out "behind the
scenes" with each program interpreting each other's login message as INVALID
login id's.  In this case XENIX was the looser with init getting trashed
every so often.  What made matters so frustrating for me was A)  I had
completley forgotten about enabling that darn VAX line and B) init
would die inconsitenly at random intervals.  Sometimes the system wouldn't
go down all day, at other times I'd be rebooting 3 or 4 times per day.

The sysptoms were all classic, just as the others had described.  init would 
die leaving every terminated process defunct and without a parent and getty's 
could not spawn after a user exited their shell.

Not realizing the stupid mistake I made by enabling that line, I was at a 
real loss trying to figure out what went wrong.  I had become convinced that
somehow the kernel had been corrupted and I had just began the process of
reinstalling the link kit and drivers.  As I was disabling the serial ports so
I could boot off a backup kernel, thats when I noticed the enabled line.
Aparrently, the Operator never had an opportunity to try the line anyway.

I'm almost embarrased to tell that one.  Have mercy!

- gb








-- 
Gary Blumenstein, UNIX Systems Administrator // CIBA-GEIGY CORPORATION, USA
===========================================================================
Voice: (914) 347-4700                  7 Skyline Drive, Hawthorne, NY 10502
FAX  : (914) 347-5687     uucp: ...{philabs, gaboon}!crpmks!{sysadm, garyb}