[comp.sys.sun] zs3: silo overflow crashes

dennett@kodak.com (Charlie Dennett) (05/17/89)

Once again I turn to the net because the Hotline is stone cold.

My system is a 3/260 under 4.0.1.  On the average of once a week, my
system crashes with a zs3: silo overflow.  Device zs3 is not mentioned in
my kernal configuration file - not even in a comment line.  The only zs
devices there are zs0 and zs1.  These crashes seem to be associated with
mouse activity - usually, but not always rapid mouse activity.  Listed
below are the relevant messages I found in /var/adm/messages from two
separate crashes.  Any insight into the problem is appreciated.

- ---Most recent crash-------
May  9 15:13:51 cygnus vmunix: ypserv: 
May  9 15:13:51 cygnus vmunix: trap address 0x8, pid 65, pc = f086282, sr = 2510, stkfmt b, context 1
May  9 15:13:51 cygnus vmunix: Bus Error Reg 80<INVALID>
May  9 15:13:51 cygnus vmunix: data fault address f357d36 faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5
May  9 15:13:51 cygnus vmunix: KERNEL MODE
May  9 15:13:51 cygnus vmunix: page map 0 pmgrp ce
May  9 15:13:51 cygnus vmunix: D0-D7  ffbf7ffe 40 1 0 0 0 2100 2d8e
May  9 15:13:51 cygnus vmunix: A0-A7  f357d32 f08bf48 ffff9632 0 0 f0ce510 f08bf50 f08bf34
May  9 15:13:51 cygnus vmunix: Begin traceback...fp = f08bf50, sp = f08bf34
May  9 15:13:51 cygnus vmunix: Called from f088ece, fp=f08bf78, args=f0b9b24 1 1 0
May  9 15:13:51 cygnus vmunix: Called from f01b38a, fp=f08bf9c, args=0 0 0 1
May  9 15:13:51 cygnus vmunix: Called from f01b26e, fp=f08bfc0, args=2000 0 2504 effffdc
May  9 15:13:51 cygnus vmunix: Called from f004704, fp=ffff9660, args=f0648b2 2000 0 e3000001
May  9 15:13:51 cygnus vmunix: End traceback...
May  9 15:13:51 cygnus vmunix: panic: Bus error
May  9 15:13:51 cygnus vmunix: zs3: silo overflow
May  9 15:13:51 cygnus vmunix: syncing file systems... [11] [11] [9] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done
May  9 15:13:51 cygnus vmunix: 
May  9 15:13:51 cygnus vmunix: dumping to vp f0e38fc, offset 17248

- ---Second most recect crash-----

May  3 11:47:08 cygnus vmunix: etherfind: 
May  3 11:47:08 cygnus vmunix: trap address 0x8, pid 1379, pc = f086282, sr = 2500, stkfmt b, context 6
May  3 11:47:08 cygnus vmunix: Bus Error Reg 80<INVALID>
May  3 11:47:08 cygnus vmunix: data fault address f33dd36 faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5
May  3 11:47:08 cygnus vmunix: KERNEL MODE
May  3 11:47:08 cygnus vmunix: page map 0 pmgrp 82
May  3 11:47:08 cygnus vmunix: D0-D7  ffbf7ffe 40 0 25 5 78 2100 54a
May  3 11:47:08 cygnus vmunix: A0-A7  f33dd32 f08bf48 ffff97e2 0 0 f0cd730 f08bf50 f08bf34
May  3 11:47:08 cygnus vmunix: Begin traceback...fp = f08bf50, sp = f08bf34
May  3 11:47:08 cygnus vmunix: Called from f088ece, fp=f08bf78, args=f0b9b24 1 1 0
May  3 11:47:08 cygnus vmunix: Called from f01b38a, fp=f08bf9c, args=0 0 2 3
May  3 11:47:08 cygnus vmunix: Called from f01b26e, fp=f08bfc0, args=9 2 0 291e8
May  3 11:47:08 cygnus vmunix: Called from f004704, fp=effefd4, args=ece7730 9 0 e3000001
May  3 11:47:08 cygnus vmunix: End traceback...
May  3 11:47:08 cygnus vmunix: panic: Bus error
May  3 11:47:08 cygnus vmunix: zs3: silo overflow
May  3 11:47:08 cygnus vmunix: syncing file systems... [13] 1 [13] [9] [3] done
May  3 11:47:08 cygnus vmunix: 
May  3 11:47:08 cygnus vmunix: dumping to vp f0e38fc, offset 17248


Charlie Dennett          | UUCP: ...!rutgers!rochester!kodak!cygnus!dennett
Infomation Services      | Internet:  dennett@cygnus.Kodak.COM
Eastman Kodak Company    |
Rochester, NY 14653-5219 |

smb@arpa.att.com (05/17/89)

Your machine isn't crashing because of a silo overflow, it's getting some
sort of bus error or other internal trap.  While it's printing the panic
and traceback stuff, the input silo -- from the mouse? dunno -- has filled
up and overflowed because interrupts are disabled.  When it finally has
the leisure to look at the i/o ports, while trying to sync the disks, it
notices the async i/o error and reports it.  That's pure cascade, and the
message itself is not relevant.

I don't know, though, why it's talking about zs3.

guy@uunet.uu.net (Guy Harris) (05/17/89)

>My system is a 3/260 under 4.0.1.  On the average of once a week, my
>system crashes with a zs3: silo overflow.  Device zs3 is not mentioned in
>my kernal configuration file - not even in a comment line.  The only zs
>devices there are zs0 and zs1.

Each "device zsN" line that appears in a "config" file refers to *two*
serial ports; a Zilog Z8530 SCC chip has two channels.  You probably have
a line like

	device		zs0 at obio ? csr 0x20000 flags 3 priority 3

which refers to devices "zs0" and "zs1" (the fact that it says only "zs0"
on the line nonwithstanding; the nomenclature for devices in the sense of
a line in a config file, and in the sense of something with a minor device
number all its own, are separate), and a line like

	device		zs1 at obio ? csr 0x00000 flags 0x103 priority 3

which refers to devices "zs2" and "zs3" (again, the "zs0" and "zs1" in the
"device" lines are not the same as the "zs0", "zs1", "zs2", and "zs3" in
the error messages; "device" line "zsN" refers to devices that would
appear as "zs(2N)" and "zs(2N+1)" in error messages).

>These crashes seem to be associated with mouse activity

Not at all surprising.  Three guesses how the mouse is attached to the
host....

The two ports on "device zs1", which are referred to as "zs2" and "zs3"
in error messages, are for the keyboard and mouse, respectively.

>- usually, but not always rapid mouse activity.

Especially mouse activity while the system is crashing for some other
reason.  If the system is crashing, it is probably printing a lot of stuff
with the kernel's "printf" routine, and doing so at a very high interrupt
priority, so that the "zs" devices can't interrupt the CPU.  As the mouse
moves, it generates a stream of 5-byte (as I remember) motion events;
that's not one 5-byte event per movement of the mouse, that's several of
them - one for each little increment the mouse moves.  The Z8530 chip has
an on-chip silo, but it only holds two bytes; if the CPU doesn't respond
pretty quickly to an interrupt request, the silo will overflow.  Once the
interrupt priority is lowered to the point where the Z8530 can interrupt
the CPU again, the interrupt comes in, the device driver notes that the
silo overflowed, and prints a message to that effect.

In other words, the "zs3: silo overflow" messages may be a *consequence*
of the crash, not a *cause* of it.