[comp.unix.ultrix] Hanging Ultrix XMI/BI systems

saus@media-lab.media.mit.edu (Mark Sausville) (09/29/89)

Just the facts, lady, just the facts.

Hardware: VAX 6320 with network, kdb50, ci (hsc), and 4 dmb32 (16xrs232)

We brought this machine up back in May and started really using it
just before July 1.  The OS at that time was straight Ultrix 3.0 (with
a bit of local stuff, but no hacked drivers, or special i/o).

The system hung (no response of any kind to anything (except ctrl-p)),
on an average, about once a day until about July 20, when we installed
a fix provided to us by Digital, which patched some kernel code
related to the dmb32s.  They claimed that the problem had to do with
modems (which we have) on the dmb32s.  This fix was supposed to be for
all XMI/BI machines.

This problem seemed to be entirely repaired by the new driver code.
To make a long story short, the problem came back after about 6 weeks,
and we have been seeing a hang every 24 hours or so.  After going
through the story again with DEC, they recommended that we upgrade to
3.1, to ensure that we had good code.

We upgraded last night and (you guessed it) we had our first hang today.
It was followed by a panic (progress?).

We're proceding with analysis of the problem, but turnaround on crash
dumps with DEC is less than instant.  The DEC hardare people are going
to get involved again, but they're a lot better on VMS.  I'm not
asking for a diagnosis here.  I would like to hear from people who
have had similar problems.

If you had the dmb/modem problem and it was fixed for good by the patch
or by 3.1, I'd like to know about that too.

					Mark.

Mark Sausville                           MIT Media Laboratory
Computer Systems Administrator           Room E15-354
617-253-0325                             20 Ames Street
saus@media-lab.media.mit.edu             Cambridge, MA 02139

grr@cbmvax.UUCP (George Robbins) (09/30/89)

In article <SAUS.89Sep28234906@media-lab.media.mit.edu> saus@media-lab.media.mit.edu (Mark Sausville) writes:
> 
> Hardware: VAX 6320 with network, kdb50, ci (hsc), and 4 dmb32 (16xrs232)
> 
> The system hung...
> 
> This problem seemed to be entirely repaired by the new driver code.
> To make a long story short, the problem came back after about 6 weeks,
> and we have been seeing a hang every 24 hours or so.  After going
> through the story again with DEC, they recommended that we upgrade to
> 3.1, to ensure that we had good code.
> 
> We upgraded last night and (you guessed it) we had our first hang today.
> It was followed by a panic (progress?).

Please, when relating tales of woe, include enough "hard" information so
that others may correlate it with their own experience.  Which panic?  If
the system hangs, is it no echo, echo but no action, etc.  For a hang it
may help to know what module the PC was in etc.  Not to get carried away,
but just enough to "label" the problem.

> We're proceding with analysis of the problem, but turnaround on crash
> dumps with DEC is less than instant.  The DEC hardare people are going
> to get involved again, but they're a lot better on VMS.  I'm not
> asking for a diagnosis here.  I would like to hear from people who
> have had similar problems.

Sigh... I'm still working on/with DEC on the LAT cooked/cbreak/raw vs.
server passthru problem.  It took months to get them to *do* anything,
the first try didn't cut it and now they want to look at my machine
that worked just fine with 1.x and 2.x up till the intromission of
Ultrix 3.1. 

Sometime I think chapter 1 on the Ultrix installation manual should
be devoted to describing how to effectively work the software "support"
mechanism.  Getting from the "fixes to known problems" to the "define
and correct new bugs" is much bigger step then you'd want to imagine.

-- 
George Robbins - now working for,	uucp: {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing	arpa: cbmvax!grr@uunet.uu.net
Commodore, Engineering Department	fone: 215-431-9255 (only by moonlite)