[comp.sys.sun] vmunix: zs0: parity error ignored

geof@aurora.com (Geoffrey H. Cooper) (07/25/90)

I've been getting a lot of these errors ever since I upgraded to SunOS4.1,
both on the machine that has modems and the one that has only a directly
connected printer.  Sun maintenance reports that lots of people have
complained about this but that no patch exists.  They say that the problem
was always there, and the only difference between this and earlier
releases is that the message is printed. 

I would like to convince my machine to stop printing this message!

Upon perusal of my /VMUNIX with my trusty ADB:

 _zsa_srint+0x70:                sethi   %hi(0xf80f7c00), %o1
 _zsa_srint+0x74:                or      %o1, 0x153, %o1          ! -0x7f082ad
 _zsa_srint+0x78:                and     %o2, 0x7f, %o2
 _zsa_srint+0x7c:                call    _log

 -0x7f082ad?s
 _zsops_async+0x2f:              zs%d: parity error ignored

STRINGS confirms that there is only one string of the above form in the
kernel.  I'm new at assembly hacking on a SPARC.  Specifically, I don't
know the calling convention.  Looks to me like I should do:

   # cp /vmunix /vmunix.save
   # adb -w /vmunix
   _zsa_srint+0x7c?W 0x1000000
	...xxxxxx = 01000000...
   _zsa_srint+0x7c?i
   _zsa_srint+0x7c:     nop
   ^D
   # /etc/reboot

Assuming the sparc keeps the same kind of calling convention as other
machines, this should work.  Anyone not agree with me?  Mail responses to
me, I'll summarize to the net next week.

- Geof

geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com

geof@aurora.com (Geoffrey H. Cooper) (08/01/90)

OK, not many responses on this one, but I got tired and tried my fix.  16
hours later, everything seems fine, and I don't get these messages
anymore.  Obviously, I am not giving a warranty!

The problem is a plethora of random messages from one of the serial ports.
Sun customer support reports that this condition was always happening, but
that the message wasn't getting logged until 4.1.  The message looks like:

 Jul 28 05:00:14 aurora vmunix: zs0: parity error ignored
 Jul 28 05:00:43 aurora vmunix: zs0: parity error ignored
 Jul 28 05:00:43 aurora last message repeated 2 times
 Jul 28 06:05:37 aurora vmunix: zs0: parity error ignored

and so on.  Zs0 means port A and Zs1 means port B (I get it only on zs0,
but others report zs1).  Well, if the only difference is that the message
is being logged now, this is something I can fix!

Upon perusal of my /VMUNIX with my trusty ADB:

 _zsa_srint+0x70:                sethi   %hi(0xf80f7c00), %o1
 _zsa_srint+0x74:                or      %o1, 0x153, %o1          ! -0x7f082ad
 _zsa_srint+0x78:                and     %o2, 0x7f, %o2
 _zsa_srint+0x7c:                call    _log

 -0x7f082ad?s
 _zsops_async+0x2f:              zs%d: parity error ignored

STRINGS confirms that there is only one string of the above form in the
kernel.

I did this:

   # cp /vmunix /vmunix.save
   # adb -w /vmunix
   _zsa_srint+0x7c?W 0x1000000
	...xxxxxx = 01000000...
   _zsa_srint+0x7c?i
   _zsa_srint+0x7c:     nop
   ^D
   # /etc/reboot

[note: ADB patches right away, so if you blow it, revert to vmunix.save
and repeat the entire procedure.  If the system won't boot, boot sd()-a,
respond with returns to everything except the BOOT: message, where you
type "vmunix.save"]

geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com

X-From: sjk@nil.UUCP (Scott J. Kramer)

We've got the same lossage on our Sun-3/180, SunOS 4.1 system with Telebit
Trailblazer+ and Supra 2400 modems on the a & b serial ports.  What's
puzzling to me is why it also happens on zs1:

    Jul 25 04:42:02 pixar vmunix: zs0: parity error ignored
    Jul 25 05:53:47 pixar vmunix: zs1: parity error ignored
    Jul 25 06:32:22 pixar vmunix: zs1: parity error went up

and that "... went up" message is peculiar, too.

Another problem is that the Supra modem on the b port (currently the
first line in a 4-number dialup rotary (we've got two other Supra's on
a CoALM board)) goes into limbo about once a day and requires
power-cycling to clear it; "ATZ"ing it is futile.  This prevents
anyone from dialing in until it's reset since our UUCP incallers only
use that first number.  I've unsuccessfully tried several combinations
of modems and ports; I still can't tell whether it's a hardware and/or
software problem.  A few of the incoming/outgoing calls are at 1200bps
and I suspect that the trouble is related to the modems and/or serial
lines not properly resetting.

At first I thought the "modem in limbo" problem was related to the
parity error stuff, but there doesn't seem to be any correlation.

So I can't tell you what those kernel messages really mean (sure is a
loss not having system source!), tho' I'm anxious to find out!

Scott J. Kramer			inet:	pixar!sjk@ucbvax.Berkeley.EDU
Pixar				uucp:	...!{sun,ucbvax}!pixar!sjk
1001 West Cutting Boulevard	voice:	415-236-4000
Richmond, CA, 94804
USA
-- 
geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com

earle@poseur.jpl.nasa.gov (Greg Earle) (08/02/90)

>Sun maintenance reports that lots of people have complained about this but
>that no patch exists.  They say that the problem was always there, and the
>only difference between this and earlier releases is that the message is
>printed.

The reason that no patch exists is because `zs0: parity error ignored' is
a message which is logged to `syslog' at priority level `kern.err'.  If
you look at the default /etc/syslog.conf file, you will see that all
kernel messages of level `kern.debug' and higher (which includes
`kern.err') are logged to the console and /var/adm/messages:

*.err;kern.debug;auth.notice;user.none          /dev/console
*.err;kern.debug;daemon,auth.notice;mail.crit;user.none /var/adm/messages

You will also note that `kern.err' messages will be sent to the user
`operator' if such a user exists and is logged on:

*.alert;kern.err;daemon.err;user.none           operator

What I (speaking for myself, and not for Sun Microsystems, Inc., my
employer) might recommend, for those who are too weak of heart to consider
adb'ing the kernel, is to remove the kernel messages from being sent to
the console and to /var/adm/messages, and instead send them to a file of
their own:

*.err;auth.notice;user.none          /dev/console
*.err;daemon,auth.notice;mail.crit;user.none /var/adm/messages
kern.debug	/var/log/syslog.kernel

or something similar.  This has the deleterious effect of merely keeping
them from blatting on your console; they still fill up just as much disk
space as if they were going to /var/adm/messages, and it makes for one
more logging file to remember/worry about.  On the other hand, "Syslog Is
Your Friend!" and you should take advantage of its capabilities.

On the other hand, if nuking them from the kernel really toots your
whistle ...

>I would like to convince my machine to stop printing this message!
>
>Upon perusal of my /vmunix with my trusty `adb':
>
> _zsa_srint+0x70:                sethi   %hi(0xf80f7c00), %o1
> _zsa_srint+0x74:                or      %o1, 0x153, %o1          ! -0x7f082ad
> _zsa_srint+0x78:                and     %o2, 0x7f, %o2
> _zsa_srint+0x7c:                call    _log
>
> -0x7f082ad?s
> _zsops_async+0x2f:              zs%d: parity error ignored
>
>`strings' confirms that there is only one string of the above form in the
>kernel.

First, note that this is probably a SPARCstation kernel, since a 4/370
GENERIC kernel I just checked had the `call _log' instruction at
`zsa_srint+0x90'.

Secondly, the *second* call to log() in zsa_srint() is the one that emits 

	zs0: parity error went up

which is much less prevalent or commonplace than `parity error ignored',
but one might still wish to supress that as well.

Thus, the *first* *2* calls to log() in each of the following are what you
want to NOP:

	/sys/`arch -k`/OBJ/zs_asynch.o		zsa_srint()

	/sys/`arch -k`/OBJ/mcp_asynch.o		mcpa_srint()

There are also 2 calls to log() for these 2 messages in

	/sys/`arch -k`/OBJ/mti.o		mtiresponse()

(The latter two files/routines handle the cases for the ALM-2 and ALM-1,
respectively; the first - zsa_async.o/zsa_srint() - are for the CPU board
serial ports)

but determining which call to log() corresponds to the 2 desired messages
is a lot harder.  On the Sun-3, the first 2 calls to log() in
mtiresponse() are the calls for the `parity error ignored' and `parity
error went up'; alas, the Sun-4 compiler does something different with the
same code and thus the first 2 calls are not the same.  Hopefully not too
many of you are using an ALM-1 board on a Sun-4 these days (^:

>Looks to me like I should do:
>
>   # cp /vmunix /vmunix.save
>   # adb -w /vmunix
>   _zsa_srint+0x7c?W 0x1000000
>	...xxxxxx = 01000000...
>   _zsa_srint+0x7c?i
>   _zsa_srint+0x7c:     nop
>   ^D
>   # /etc/reboot

I would do instead (for just the simple `zs#: parity error ignored' case)

	# adb -w -k /vmunix /dev/mem
	_zsa_srint+0x7c?W 0x1000000
	    ...xxxxxx = 01000000...
	_zsa_srint+0x7c/W 0x1000000
	    ...xxxxxx = 01000000...
	^D
	# (no need to reboot)

To perma-fix this, you would need to modify the 3 kernel modules that I
mentioned above (along with the 2nd call to log() in each of the
aforementioned functions to kill the `xxx#: parity error went up' case as
well) for future 4.1 kernels.

	- Greg Earle	
	  Sun Microsystems, Inc. - JPL on-site Software Support
	  earle@poseur.JPL.NASA.GOV	(Direct)
	  earle@Sun.COM			(Indirect)