dennett@kodak.com (Charlie Dennett) (05/17/89)
Once again I turn to the net because the Hotline is stone cold. My system is a 3/260 under 4.0.1. On the average of once a week, my system crashes with a zs3: silo overflow. Device zs3 is not mentioned in my kernal configuration file - not even in a comment line. The only zs devices there are zs0 and zs1. These crashes seem to be associated with mouse activity - usually, but not always rapid mouse activity. Listed below are the relevant messages I found in /var/adm/messages from two separate crashes. Any insight into the problem is appreciated. - ---Most recent crash------- May 9 15:13:51 cygnus vmunix: ypserv: May 9 15:13:51 cygnus vmunix: trap address 0x8, pid 65, pc = f086282, sr = 2510, stkfmt b, context 1 May 9 15:13:51 cygnus vmunix: Bus Error Reg 80<INVALID> May 9 15:13:51 cygnus vmunix: data fault address f357d36 faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5 May 9 15:13:51 cygnus vmunix: KERNEL MODE May 9 15:13:51 cygnus vmunix: page map 0 pmgrp ce May 9 15:13:51 cygnus vmunix: D0-D7 ffbf7ffe 40 1 0 0 0 2100 2d8e May 9 15:13:51 cygnus vmunix: A0-A7 f357d32 f08bf48 ffff9632 0 0 f0ce510 f08bf50 f08bf34 May 9 15:13:51 cygnus vmunix: Begin traceback...fp = f08bf50, sp = f08bf34 May 9 15:13:51 cygnus vmunix: Called from f088ece, fp=f08bf78, args=f0b9b24 1 1 0 May 9 15:13:51 cygnus vmunix: Called from f01b38a, fp=f08bf9c, args=0 0 0 1 May 9 15:13:51 cygnus vmunix: Called from f01b26e, fp=f08bfc0, args=2000 0 2504 effffdc May 9 15:13:51 cygnus vmunix: Called from f004704, fp=ffff9660, args=f0648b2 2000 0 e3000001 May 9 15:13:51 cygnus vmunix: End traceback... May 9 15:13:51 cygnus vmunix: panic: Bus error May 9 15:13:51 cygnus vmunix: zs3: silo overflow May 9 15:13:51 cygnus vmunix: syncing file systems... [11] [11] [9] [2] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done May 9 15:13:51 cygnus vmunix: May 9 15:13:51 cygnus vmunix: dumping to vp f0e38fc, offset 17248 - ---Second most recect crash----- May 3 11:47:08 cygnus vmunix: etherfind: May 3 11:47:08 cygnus vmunix: trap address 0x8, pid 1379, pc = f086282, sr = 2500, stkfmt b, context 6 May 3 11:47:08 cygnus vmunix: Bus Error Reg 80<INVALID> May 3 11:47:08 cygnus vmunix: data fault address f33dd36 faultc 0 faultb 0 dfault 1 rw 1 size 0 fcode 5 May 3 11:47:08 cygnus vmunix: KERNEL MODE May 3 11:47:08 cygnus vmunix: page map 0 pmgrp 82 May 3 11:47:08 cygnus vmunix: D0-D7 ffbf7ffe 40 0 25 5 78 2100 54a May 3 11:47:08 cygnus vmunix: A0-A7 f33dd32 f08bf48 ffff97e2 0 0 f0cd730 f08bf50 f08bf34 May 3 11:47:08 cygnus vmunix: Begin traceback...fp = f08bf50, sp = f08bf34 May 3 11:47:08 cygnus vmunix: Called from f088ece, fp=f08bf78, args=f0b9b24 1 1 0 May 3 11:47:08 cygnus vmunix: Called from f01b38a, fp=f08bf9c, args=0 0 2 3 May 3 11:47:08 cygnus vmunix: Called from f01b26e, fp=f08bfc0, args=9 2 0 291e8 May 3 11:47:08 cygnus vmunix: Called from f004704, fp=effefd4, args=ece7730 9 0 e3000001 May 3 11:47:08 cygnus vmunix: End traceback... May 3 11:47:08 cygnus vmunix: panic: Bus error May 3 11:47:08 cygnus vmunix: zs3: silo overflow May 3 11:47:08 cygnus vmunix: syncing file systems... [13] 1 [13] [9] [3] done May 3 11:47:08 cygnus vmunix: May 3 11:47:08 cygnus vmunix: dumping to vp f0e38fc, offset 17248 Charlie Dennett | UUCP: ...!rutgers!rochester!kodak!cygnus!dennett Infomation Services | Internet: dennett@cygnus.Kodak.COM Eastman Kodak Company | Rochester, NY 14653-5219 |
smb@arpa.att.com (05/17/89)
Your machine isn't crashing because of a silo overflow, it's getting some sort of bus error or other internal trap. While it's printing the panic and traceback stuff, the input silo -- from the mouse? dunno -- has filled up and overflowed because interrupts are disabled. When it finally has the leisure to look at the i/o ports, while trying to sync the disks, it notices the async i/o error and reports it. That's pure cascade, and the message itself is not relevant. I don't know, though, why it's talking about zs3.
guy@uunet.uu.net (Guy Harris) (05/17/89)
>My system is a 3/260 under 4.0.1. On the average of once a week, my >system crashes with a zs3: silo overflow. Device zs3 is not mentioned in >my kernal configuration file - not even in a comment line. The only zs >devices there are zs0 and zs1. Each "device zsN" line that appears in a "config" file refers to *two* serial ports; a Zilog Z8530 SCC chip has two channels. You probably have a line like device zs0 at obio ? csr 0x20000 flags 3 priority 3 which refers to devices "zs0" and "zs1" (the fact that it says only "zs0" on the line nonwithstanding; the nomenclature for devices in the sense of a line in a config file, and in the sense of something with a minor device number all its own, are separate), and a line like device zs1 at obio ? csr 0x00000 flags 0x103 priority 3 which refers to devices "zs2" and "zs3" (again, the "zs0" and "zs1" in the "device" lines are not the same as the "zs0", "zs1", "zs2", and "zs3" in the error messages; "device" line "zsN" refers to devices that would appear as "zs(2N)" and "zs(2N+1)" in error messages). >These crashes seem to be associated with mouse activity Not at all surprising. Three guesses how the mouse is attached to the host.... The two ports on "device zs1", which are referred to as "zs2" and "zs3" in error messages, are for the keyboard and mouse, respectively. >- usually, but not always rapid mouse activity. Especially mouse activity while the system is crashing for some other reason. If the system is crashing, it is probably printing a lot of stuff with the kernel's "printf" routine, and doing so at a very high interrupt priority, so that the "zs" devices can't interrupt the CPU. As the mouse moves, it generates a stream of 5-byte (as I remember) motion events; that's not one 5-byte event per movement of the mouse, that's several of them - one for each little increment the mouse moves. The Z8530 chip has an on-chip silo, but it only holds two bytes; if the CPU doesn't respond pretty quickly to an interrupt request, the silo will overflow. Once the interrupt priority is lowered to the point where the Z8530 can interrupt the CPU again, the interrupt comes in, the device driver notes that the silo overflowed, and prints a message to that effect. In other words, the "zs3: silo overflow" messages may be a *consequence* of the crash, not a *cause* of it.