larry@tapa.uucp (Larry Pajakowski) (06/21/89)
Perhaps someone can shed some light on a perplexing problem we are having. We have a Compaq 386/20 running Xenix-386 2.3.1 with Excelan TCP/IP V3.5 and Xenix-Net 1.2. About once a week more or less init dies. After that of course the machine slowly grinds to a halt and must be powered off. I now have a script running periodically which checks for init and reboots after doing a ps if there is no init. Ok that keeps it alive but why? I've talked both to SCO and Excelan. Neither has been able to help much. It may be slightly worse under heavy TCP/IP load but then I've had it happen on an idle machine. There have been power line glitch monitors on the power and we have run diagnostics over a weekend with no indications of any problem. The only other clue is 2 kernel panics over the last 3 months "Free inode isnt't". I would appreciate hearing from anyone with some ideas either by email or phone. Many Thanks. Larry Pajakowski Abbott Labs. ...!ddsw1!abtcser!larry 1-312-937-1153
edhew@egvideo.UUCP (Ed Hew) (06/24/89)
I had originally replied to this via email, however it occurs to me that perhaps someone else may have similar problems, or better yet, have resolved them. In article <1989Jun21.114506.1378@tapa.uucp> larry@tapa.uucp (Larry Pajakowski) writes: > Perhaps someone can shed some light on a perplexing problem we are having. > We have a Compaq 386/20 running Xenix-386 2.3.1 with Excelan TCP/IP V3.5 > and Xenix-Net 1.2. I was in a similar (lack of) light several months ago, (shortly after our conversion to 2.3.1). The major difference was that it wasn't TCP/IP, it was uucp (kind of) causing me headaches. > > About once a week more or less init dies. After that of course the machine > slowly grinds to a halt and must be powered off. I now have a script running > periodically which checks for init and reboots after doing a ps if there is no > init. Ok that keeps it alive but why? My scenario was as follows: I'd leave this system just humming along and go to work. I'd return late that night and find my system ground to a halt. Nobody was cleaning up defunct processes and my process table was full. Hence the system was effectively dead. init had somehow been assasinated. Of course, I'd discover this after I logged on with my *non*root account, so I couldn't even do a proper shutdown. With no init, I have no getty, and can get no login. I'd log on on tty01; the original getty would at least let me do that and replace itself with my login shell, but, then.... log off to log on as root, and... well...... [arghhh! where's that switch? ....sure like fsck, ummhmmmmm]. RTFM says something like: "shutdown can only be run in the foreground by root". After a couple of weeks of fruitless testing and surmization (sp?), I turned the process accounting on. Well, let's be honest, I always had the proc accounting on, I just decided to look at it. 1/2 :-) Sure enough, init was exiting for some reason right when I had a cron task disable and enable the tty that had an attached uuxqt happening, processing news. Some background is required here. The disable/enable was a workaround to a problem whereby DTR wasn't (for some still unresolved reason) being raised after polling our host for news. So, we simply cron'd a script to disable/ enable the TBit tty every 15 minutes if nobody was on it at the time. That solved the (no DTR) problem, but then the above occured. The disable/ enable was assassinating init. Process accounting says so. Now we check to make sure uuxqt isn't running at that time as well. Haven't had a problem since. I can also tell you that the above results have been manually recreated on this site. Sometimes. It's not consistent. Arghhhh! There is a missing factor here. I don't know what it is. > I've talked both to SCO and Excelan. Neither has been able to help much. > It may be slightly worse under heavy TCP/IP load but then I've had it happen > on an idle machine. There have been power line glitch monitors on the power > and we have run diagnostics over a weekend with no indications of any problem. > The only other clue is 2 kernel panics over the last 3 months "Free inode > isnt't". In my case: A thought: I wonder if this could be related to the old problem in pre-2.2.x releases where the docs warned us that using a disable/enable sequence without separating them by at least a 1 minute interval was asking for trouble. All I can suggest is that you check out the above info; check out what your process accounting tells you. Find out what's happening when init dies, and prevent it from happening. If you ever find out *why* this happens, please email me. Right now I am still using a workaround. I'd rather find a *fix*. > I would appreciate hearing from anyone with some ideas either by email or > phone. Many Thanks. Hope this helps. > > Larry Pajakowski > Abbott Labs. ...!ddsw1!abtcser!larry 1-312-937-1153 --ed {edhew@egvideo.uucp} Ed. A. Hew Authorized SCO Technical Trainer Xeni/Con Corporation work: edhew@xenicon.uucp -or- ..!{uunet!}utai!lsuc!xenicon!edhew home: edhew@egvideo.uucp -or- ..!{uunet!}watmath!egvideo!edhew # I haven't lost my mind, it's backed up on floppy around here somewhere!
wht@tridom.uucp (Warren Tucker) (06/26/89)
In article <2045@egvideo.UUCP>, edhew@egvideo.UUCP (Ed Hew) writes: > > RTFM says something like: "shutdown can only be run in the foreground by > root". haltsys or reboot is SCO's way of telling shutdown where to stuff it! It might not be nice for servers or off-hokk comm lines, but it WILL shut the system down RIGHT AWAY. -- ------------------------------------------------------------------- Warren Tucker, Tridom Corporation ...!gatech!emory!tridom!wht Sforzando (It., sfohr-tsahn'-doh). A direction to perform the tone or chord with special stress, or marked and sudden emphasis.
garyb@gallium.UUCP (Gary Blumenstein) (07/02/89)
In article <2045@egvideo.UUCP> edhew@egvideo.UUCP (Ed Hew) writes: >In article <1989Jun21.114506.1378@tapa.uucp> larry@tapa.uucp (Larry Pajakowski) writes: >> Perhaps someone can shed some light on a perplexing problem we are having. Me three! I had a similar experience with init dying. Listen to this one. Some months back I hooked up a serial line from one of our ports to a port on our VAX so I could log into our VMS (ugh!) system. Well one night I was showing the Operator in the data center what Usenet was all about and I said, "wait a minute, I'll give you an account on our XENIX system so you can log in when you're bored silly at 11pm and read a little comp.os.vms or whatever". "Really!?", he said. "No problem, we have a direct line set up ready to go.", I said as I dim-wittedly enabled the port on the XENIX side. Enter the dreaded battle of the logins. In this case it was VMS LOGIN versus XENIX getty. Invisibly, the poor computers were duking it out "behind the scenes" with each program interpreting each other's login message as INVALID login id's. In this case XENIX was the looser with init getting trashed every so often. What made matters so frustrating for me was A) I had completley forgotten about enabling that darn VAX line and B) init would die inconsitenly at random intervals. Sometimes the system wouldn't go down all day, at other times I'd be rebooting 3 or 4 times per day. The sysptoms were all classic, just as the others had described. init would die leaving every terminated process defunct and without a parent and getty's could not spawn after a user exited their shell. Not realizing the stupid mistake I made by enabling that line, I was at a real loss trying to figure out what went wrong. I had become convinced that somehow the kernel had been corrupted and I had just began the process of reinstalling the link kit and drivers. As I was disabling the serial ports so I could boot off a backup kernel, thats when I noticed the enabled line. Aparrently, the Operator never had an opportunity to try the line anyway. I'm almost embarrased to tell that one. Have mercy! - gb -- Gary Blumenstein, UNIX Systems Administrator // CIBA-GEIGY CORPORATION, USA =========================================================================== Voice: (914) 347-4700 7 Skyline Drive, Hawthorne, NY 10502 FAX : (914) 347-5687 uucp: ...{philabs, gaboon}!crpmks!{sysadm, garyb}