[net.unix-wizards] Hanging VAX 11/780 - any solutions?

yoonkim@kaist.UUCP (Yoon Kim) (09/20/84)

Our VAX 11/780 running 4.2BSD hangs once in a while.  It barely
stays up over 2 days.  The symptom is that terminals begin to hang
one by one.  And then the whole system is hung somewhere.  Has anyone
suffered similar/same problems?  Any solutions for me?  Any ways of
diagnosting?

yOOn
hplabs!kaist!yoonkim

P.S.  We have SI disk controller 9800 and two CDC's 9766 drivers
	linked to it.

irwin@uiucdcs.UUCP (09/26/84)

When the machine is hung, you need to do a control p followed by a h
for halt. Get the >>> prompt and then dump all of the registers to
the console to get a printout. You can have Tom Greeneisen from SI
read these dumps to see what the machine was doing. Tom was a level
three engineer at DEC before going to SI and is very informed on the
Vax-780. If you don't know how to dump the registers, get to Tom
through your SI rep. Tom will tell you what to do.

Tom is usually at the Cincinatti Ohio office, but is currently on
the west coast for about 5 weeks.

If you need help on the dump process, call me at 217-333-4801, I can
give it to you over the phone.

thomson@uthub.UUCP (Brian Thomson) (09/28/84)

If your 4.nBSD VAX hangs, you may force a dump by halting it and ...

	>>H
	>>E/P/L 4
		P	00000004	number
	>>C number
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson

sdo@u1100a.UUCP (Scott Orshan) (09/30/84)

It's probably caused by too many files in your net/general
directory with too many links in other directories (such as
net/unix-wizards).  Kind of like <950@kaist.UUCP>.
-- 

			Scott Orshan
			Bell Communications Research
			201-981-3064
			{ihnp4,allegra,pyuxww}!u1100a!sdo

jnelson@trwrba.UUCP (John T. Nelson) (10/02/84)

	stays up over 2 days.  The symptom is that terminals begin to
	hang one by one.  And then the whole system is hung somewhere.
	Has anyone suffered similar/same problems?  Any solutions for
	me?  Any ways of diagnosting?

We have a similar problem although we get dmf32 silo overflow
errors preceding terminal hangs.  System then goes off into
nowhere's land.  Kernel is running, it just isn't doing anything.

I'm not particularly concerned however, it looks like some sort
of odd uda50/dmf32 interaction.... or dmf32 driver problem.

guy@rlgvax.UUCP (Guy Harris) (10/03/84)

> If your 4.nBSD VAX hangs, you may force a dump by halting it and ...
> 
> 	>>H
> 	>>E/P/L 4
> 		P	00000004	number
> 	>>C number

You can also use the "CRASH" command script that comes on the Berkeley
11/780 boot floppy (I seem to remember it being part of the build script
for the floppy, anyway; I know it's on our floppy); it's only "bug" is
that the comments it prints out say something about crashing a VMS system,
but UNIX dies just as nicely as VMS when you jam a nasty value into the PC...

Just type a ^P and type "@CRASH" when you get the prompt.

Don't know whether any such script exists for 730s or 750s, or whether
the Berkeley cassettes include them.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

chris@umcp-cs.UUCP (Chris Torek) (10/06/84)

There DOES seem to be a bug in various versions of the 4.2BSD UDA50
driver.  In particular, ``df'' will appear to hang frequently when
the disk is busy.  (If you are careful and lucky, you can unhang it
with another ``df''.)

The reason is that the UDA50 driver has code in udopen() that checks
to make sure the drive is really there, by doing an on line command.
Unfortunately it doesn't block interrupts while doing it.  The command
takes long enough that this isn't usually a problem, but if the disk
is busy, blammo!  (Other random interrupt load could do it too.)

The funny thing is that it doesn't even need to do the on-line in most
cases.

There may be DMF driver bugs, but we haven't run into any yet.

[Fix to UDA driver is simply move the splx(s)'s around so that the
test on ui->ui_flags==0 followed by a M_OP_ONLIN is atomic; it's also
a good idea to not bother with the on line if the drive is on line
already.]
-- 
(This mind accidently left blank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland