geof@aurora.com (Geoffrey H. Cooper) (07/25/90)
I've been getting a lot of these errors ever since I upgraded to SunOS4.1, both on the machine that has modems and the one that has only a directly connected printer. Sun maintenance reports that lots of people have complained about this but that no patch exists. They say that the problem was always there, and the only difference between this and earlier releases is that the message is printed. I would like to convince my machine to stop printing this message! Upon perusal of my /VMUNIX with my trusty ADB: _zsa_srint+0x70: sethi %hi(0xf80f7c00), %o1 _zsa_srint+0x74: or %o1, 0x153, %o1 ! -0x7f082ad _zsa_srint+0x78: and %o2, 0x7f, %o2 _zsa_srint+0x7c: call _log -0x7f082ad?s _zsops_async+0x2f: zs%d: parity error ignored STRINGS confirms that there is only one string of the above form in the kernel. I'm new at assembly hacking on a SPARC. Specifically, I don't know the calling convention. Looks to me like I should do: # cp /vmunix /vmunix.save # adb -w /vmunix _zsa_srint+0x7c?W 0x1000000 ...xxxxxx = 01000000... _zsa_srint+0x7c?i _zsa_srint+0x7c: nop ^D # /etc/reboot Assuming the sparc keeps the same kind of calling convention as other machines, this should work. Anyone not agree with me? Mail responses to me, I'll summarize to the net next week. - Geof geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com
geof@aurora.com (Geoffrey H. Cooper) (08/01/90)
OK, not many responses on this one, but I got tired and tried my fix. 16 hours later, everything seems fine, and I don't get these messages anymore. Obviously, I am not giving a warranty! The problem is a plethora of random messages from one of the serial ports. Sun customer support reports that this condition was always happening, but that the message wasn't getting logged until 4.1. The message looks like: Jul 28 05:00:14 aurora vmunix: zs0: parity error ignored Jul 28 05:00:43 aurora vmunix: zs0: parity error ignored Jul 28 05:00:43 aurora last message repeated 2 times Jul 28 06:05:37 aurora vmunix: zs0: parity error ignored and so on. Zs0 means port A and Zs1 means port B (I get it only on zs0, but others report zs1). Well, if the only difference is that the message is being logged now, this is something I can fix! Upon perusal of my /VMUNIX with my trusty ADB: _zsa_srint+0x70: sethi %hi(0xf80f7c00), %o1 _zsa_srint+0x74: or %o1, 0x153, %o1 ! -0x7f082ad _zsa_srint+0x78: and %o2, 0x7f, %o2 _zsa_srint+0x7c: call _log -0x7f082ad?s _zsops_async+0x2f: zs%d: parity error ignored STRINGS confirms that there is only one string of the above form in the kernel. I did this: # cp /vmunix /vmunix.save # adb -w /vmunix _zsa_srint+0x7c?W 0x1000000 ...xxxxxx = 01000000... _zsa_srint+0x7c?i _zsa_srint+0x7c: nop ^D # /etc/reboot [note: ADB patches right away, so if you blow it, revert to vmunix.save and repeat the entire procedure. If the system won't boot, boot sd()-a, respond with returns to everything except the BOOT: message, where you type "vmunix.save"] geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com X-From: sjk@nil.UUCP (Scott J. Kramer) We've got the same lossage on our Sun-3/180, SunOS 4.1 system with Telebit Trailblazer+ and Supra 2400 modems on the a & b serial ports. What's puzzling to me is why it also happens on zs1: Jul 25 04:42:02 pixar vmunix: zs0: parity error ignored Jul 25 05:53:47 pixar vmunix: zs1: parity error ignored Jul 25 06:32:22 pixar vmunix: zs1: parity error went up and that "... went up" message is peculiar, too. Another problem is that the Supra modem on the b port (currently the first line in a 4-number dialup rotary (we've got two other Supra's on a CoALM board)) goes into limbo about once a day and requires power-cycling to clear it; "ATZ"ing it is futile. This prevents anyone from dialing in until it's reset since our UUCP incallers only use that first number. I've unsuccessfully tried several combinations of modems and ports; I still can't tell whether it's a hardware and/or software problem. A few of the incoming/outgoing calls are at 1200bps and I suspect that the trouble is related to the modems and/or serial lines not properly resetting. At first I thought the "modem in limbo" problem was related to the parity error stuff, but there doesn't seem to be any correlation. So I can't tell you what those kernel messages really mean (sure is a loss not having system source!), tho' I'm anxious to find out! Scott J. Kramer inet: pixar!sjk@ucbvax.Berkeley.EDU Pixar uucp: ...!{sun,ucbvax}!pixar!sjk 1001 West Cutting Boulevard voice: 415-236-4000 Richmond, CA, 94804 USA -- geof@aurora.com / aurora!geof@decwrl.dec.com / geof%aurora.com@decwrl.dec.com
earle@poseur.jpl.nasa.gov (Greg Earle) (08/02/90)
>Sun maintenance reports that lots of people have complained about this but >that no patch exists. They say that the problem was always there, and the >only difference between this and earlier releases is that the message is >printed. The reason that no patch exists is because `zs0: parity error ignored' is a message which is logged to `syslog' at priority level `kern.err'. If you look at the default /etc/syslog.conf file, you will see that all kernel messages of level `kern.debug' and higher (which includes `kern.err') are logged to the console and /var/adm/messages: *.err;kern.debug;auth.notice;user.none /dev/console *.err;kern.debug;daemon,auth.notice;mail.crit;user.none /var/adm/messages You will also note that `kern.err' messages will be sent to the user `operator' if such a user exists and is logged on: *.alert;kern.err;daemon.err;user.none operator What I (speaking for myself, and not for Sun Microsystems, Inc., my employer) might recommend, for those who are too weak of heart to consider adb'ing the kernel, is to remove the kernel messages from being sent to the console and to /var/adm/messages, and instead send them to a file of their own: *.err;auth.notice;user.none /dev/console *.err;daemon,auth.notice;mail.crit;user.none /var/adm/messages kern.debug /var/log/syslog.kernel or something similar. This has the deleterious effect of merely keeping them from blatting on your console; they still fill up just as much disk space as if they were going to /var/adm/messages, and it makes for one more logging file to remember/worry about. On the other hand, "Syslog Is Your Friend!" and you should take advantage of its capabilities. On the other hand, if nuking them from the kernel really toots your whistle ... >I would like to convince my machine to stop printing this message! > >Upon perusal of my /vmunix with my trusty `adb': > > _zsa_srint+0x70: sethi %hi(0xf80f7c00), %o1 > _zsa_srint+0x74: or %o1, 0x153, %o1 ! -0x7f082ad > _zsa_srint+0x78: and %o2, 0x7f, %o2 > _zsa_srint+0x7c: call _log > > -0x7f082ad?s > _zsops_async+0x2f: zs%d: parity error ignored > >`strings' confirms that there is only one string of the above form in the >kernel. First, note that this is probably a SPARCstation kernel, since a 4/370 GENERIC kernel I just checked had the `call _log' instruction at `zsa_srint+0x90'. Secondly, the *second* call to log() in zsa_srint() is the one that emits zs0: parity error went up which is much less prevalent or commonplace than `parity error ignored', but one might still wish to supress that as well. Thus, the *first* *2* calls to log() in each of the following are what you want to NOP: /sys/`arch -k`/OBJ/zs_asynch.o zsa_srint() /sys/`arch -k`/OBJ/mcp_asynch.o mcpa_srint() There are also 2 calls to log() for these 2 messages in /sys/`arch -k`/OBJ/mti.o mtiresponse() (The latter two files/routines handle the cases for the ALM-2 and ALM-1, respectively; the first - zsa_async.o/zsa_srint() - are for the CPU board serial ports) but determining which call to log() corresponds to the 2 desired messages is a lot harder. On the Sun-3, the first 2 calls to log() in mtiresponse() are the calls for the `parity error ignored' and `parity error went up'; alas, the Sun-4 compiler does something different with the same code and thus the first 2 calls are not the same. Hopefully not too many of you are using an ALM-1 board on a Sun-4 these days (^: >Looks to me like I should do: > > # cp /vmunix /vmunix.save > # adb -w /vmunix > _zsa_srint+0x7c?W 0x1000000 > ...xxxxxx = 01000000... > _zsa_srint+0x7c?i > _zsa_srint+0x7c: nop > ^D > # /etc/reboot I would do instead (for just the simple `zs#: parity error ignored' case) # adb -w -k /vmunix /dev/mem _zsa_srint+0x7c?W 0x1000000 ...xxxxxx = 01000000... _zsa_srint+0x7c/W 0x1000000 ...xxxxxx = 01000000... ^D # (no need to reboot) To perma-fix this, you would need to modify the 3 kernel modules that I mentioned above (along with the 2nd call to log() in each of the aforementioned functions to kill the `xxx#: parity error went up' case as well) for future 4.1 kernels. - Greg Earle Sun Microsystems, Inc. - JPL on-site Software Support earle@poseur.JPL.NASA.GOV (Direct) earle@Sun.COM (Indirect)