[net.unix-wizards] Help needed with SC750/eagle and panic when used for swapping.

dbr@cybvax0.UUCP (Douglas Robinson) (10/10/84)

We are currently running the following configuration:

	VAX 11/750
	hp0 = RM80 with RH750 controller (system disk)
	hp1 = Fujitsu Eagle with Emulex SC750 controller
	(and other unibus devices)

I have set up the kernel to be able to swap to both the RM80 and the Eagle.
From the documentation, I note that I have to manually turn on the swapping
to the eagle, either through a line in the /etc/rc (or /etc/rc.local) file,
or by hand (both using the 'swapon' command).

Whenever I turn the swapping on for the eagle, within a day the system
panics with the following:

	hp1: not ready
	hp1X: hard error snYYYY mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
		er1=4<RMR> er2=100000<BSE> mr=10 mr2=11777

The X and YYYY above seem to vary, with actual examples being:

	hp1: not ready
	hp1a: hard error sn17 mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
		er1=4<RMR> er2=100000<BSE> mr=10 mr2=11777

	hp1: not ready
	hp1b: hard error sn22009 mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
		er1=4<RMR> er2=100000<BSE> mr=10 mr2=11777

	hp1: not ready
	hp1g: hard error sn375945 mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
		er1=4<RMR> er2=0 mr=10 mr2=11777

I formatted the disk the first time with the most extensive pass which
took about 14 Hours.  Before I do it again I'd like to know that
the problem IS a BAD SECTOR, not some other piece of code.

Has anyone had a similar problem and found a fix?  HELP!!!!

Note that if I do NOT turn the swapping to the eagle by hand (I DON'T
DO IT FROM /etc/rc or /etc/rc.local) our system stayes up for weeks!
I need the extra swapping arm as we are currently supporting (don't
gag please) ~25 users on this machine, and when they all start working
the disk accesses are drowning out the CPU (~25 RESIDUAL processor)!

Any help MUCH appreciated.

	Doug Robinson
	Cybermation, Inc.
	617/492-8810
	...!mit-eddie!cybvax0!dbr
	...!harvard!cybvax0!dbr

Jobs don't kill programmers... programmers kill jobs!

chris@umcp-cs.UUCP (Chris Torek) (10/14/84)

*	From: dbr@cybvax0.UUCP (Douglas Robinson)

	... the system panics with the following:

	hp1: not ready
	hp1X: hard error snYYYY mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
		er1=4<RMR> er2=100000<BSE> mr=10 mr2=11777

I don't know of a fix but the problem is most likely not the disk
drive.  The ``not ready'' message comes from /sys/vaxmba/mba.c whenever
the ready bit is off unexpectedly.  Probably you have a hardware
problem in the SC750, or something like that.  (Might just be a loose
card....  Considering the bits, it looks like the SC750 is executing
some commands that were never sent to it.)

[The bits above decode to: ATTN => drive attention bit on, DTCMP =>
data transfer complete, DTABT => data transfer aborted, MXF => missed
transfer, RMR => register modification refused, BSE => bad sector
error.  The above can be found, after a bit of searching, in the VAX
hardware books, or you can look in /sys/vaxmba/mbareg.h and
/sys/vaxmba/hpreg.h, if you have kernel sources.]
-- 
(This mind accidently left blank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

thomas@utah-gr.UUCP (Spencer W. Thomas) (10/15/84)

>*	From: dbr@cybvax0.UUCP (Douglas Robinson)
>
>	... the system panics with the following:
>
>	hp1: not ready
>	hp1X: hard error snYYYY mbsr=13100<ATTN,DTCMP,DTABT,MXF> \
>		er1=4<RMR> er2=100000<BSE> mr=10 mr2=11777

I think this one is one we fixed a while ago.  Here is our comment:

revision 1.3        
date: 84/08/23 03:17:09;  author: lepreau;  state: Exp;  lines
added/del: 87/6
Actual mod date 1/16/84:
--by swt & fjl: in mbintr, don't assume that if the mba was active that the
interrupt is for us: make sure it's ours.  This was causing gr to panic,
presumably due to long spiral reads (as in swap) which took longer than
parallel seeks.  Now why hasn't anyone else seen this?

=S

tggsu@resonex.UUCP (Tom Gulvin Root) (10/15/84)

From a recent 'Popular Parallel Universe'...

(in italics, of course)
Say Smokey:
	I've got an '84 Digital VAX 750 with the standard RH750/RM80 
powertrain. Seeking extra horse power, I added an aftermarket Fuji Eagle
turbocharger with an Emulex SC750 intercooler. I must say that this new
combination works well - I really like to drag conventional 750's at lights
(0-50 users is 4 seconds faster than stock!) and the unix handling package
has it all over flabby VMS on those twisty developement curves (but bumpy
management roads bounce me around alot). Now for the problem:
the turbo cuts out on really hard left turns! The boost ready light stays
on, but the boost guage (and the seat of my pants) says 'no boost' and
the car just dies. Sometimes it's hard to restart and I have to turn the
ignition key on and off several times. I have been to several different
dealers and they say either "they all do that after a while", "you aren't
using it right - it must be in your mind" or "don't make left turns". I
really would like to fix this problem.
			Doug "it's not my fault!" Robinson
			Cybermation, Cambridge, Taxichoozits.

Doug:
	I have had several VAXs in the shop for the same problem recently
and went round and round on this trying to figure it out. I tried everything:
diagnostics, turbo charger swaps, intercooler swaps, checking every cable for
continuity - I even tried duct tape and bubble gum - still no luck! All
the dealer mechanics said that some of the Eagles do this, but they
didn't have the slightest idea why ("We just hang the little number card from
the rear view mirror, make sure we move it to a different spot on the
lot and charge the customer $472.16, isn't that right Clem?") Sooooo,
I gave my friends at the Emulex factory a call and they say they never heard
of a problem like this; as far as they knew, the SC750 intercooler would work
with almost any turbo. Upon trying my Fuji contacts, I discovered that they
have had several service bulletins about this very problem! The one I've used
in the past is
ECO#22 on the drive's RFJAU board; you local service tech should be able
to install this for a nominal fee (free).

Boy, I wish they'd check all the engineering out before they ship a
product! I still say that the only one for me is
a 1965 IBM 360/40 with 2714 drives. Now there's a real system, with real
cabinets, real lights, real buttons, real hydraulic fluid in the disk drives!

	Tom Gulvin, Resonex Inc, Sunnyvale CA, 408-720-8600