stan@tikal.UUCP (Stan Tazuma) (12/16/85)
Configuration: VAX 11/750 4.1bsd Unix 1 UDA50 disk controller, 2 RA81's TU78 tape controller with 2 slaves Problem messages: uda0: soft error, disk transfer error, unit 1 uda0: soft error, SDI error, unit 1, event 0353 uda0: soft error, disk transfer error, unit 0 uda0: soft error, SDI error, unit 0, event 0353 The SDI error message is always paired with the disk transfer error message. Notice no sector numbers. The messages occur sometimes 3 hours apart, sometimes 24 hours apart, but usually not in bunches. Typically, only 2-6 occur per day on a fairly active system. Other: Out of 3 identical installations, we get these error messages on only one. The system doesn't crash when these messages occur, but peculiar things happen (like the autoboot sequence pausing (until a key is hit) just before /etc/rc is run (no, we don't have rc prompt for anything)). Anybody have experiences with these errors? We have had DEC field service on this for weeks (months?), and they don't seem to know what to do. We have no manuals on the UDA50 (the computers are installed at customer sites), so if anybody out there does, maybe telling us what event 0353 is would be of some help(?) Thanks in advance.
spaf@gatech.CSNET (Gene Spafford) (12/18/85)
In article <296@tikal.UUCP> stan@tikal.UUCP (Stan Tazuma) writes: >Other: > Out of 3 identical installations, we get these error > messages on only one. The system doesn't crash when > these messages occur, but peculiar things happen (like > the autoboot sequence pausing (until a key is hit) > just before /etc/rc is run (no, we don't have rc prompt > for anything)). > >Anybody have experiences with these errors? I'm afraid I can't help you with your disk errors, but I can tell you why your system hangs during autoboot. The printing console you get with the 750 (probably a LA100-type) can't take characters at the rate the controller is strapped for. When any significant amount of output is transferred to the console before the system comes up, the terminal sends a control S to the system to stop the output to prevent a buffer overflow. So far, so good. The console catches up and sends a control Q to restart output (the output never really stops since nothing is paying attention to the console at the moment). Unfortunately, since Unix isn't back up, it hasn't issued the command to fetch the control S out of the console input register. The control Q gets tossed away and the overrun bit is set. Now the system continues on its merry way and suddenly comes up far enough to poll the console. Lo, there's a control S (it never checks the overrun bit)! It processes the control S and "stops" output. Eventually, the buffer fills up and Unix hangs in the reboot code as it tries to put more characters into the buffer. Typing any character causes the driver to start printing again. Fixes in order of preference: Setting the console to draft/letter mode button to the setting opposite to what you have sometimes works (set it to draft? I'm at home at the moment and can't remember which setting is needed). Change the kernel to either have a bigger console buffer, or check that the character fetched is a control S and the overrun bit is set, or that the first character ever fetched is a control S. Configuring the terminal with the "setup" key to not use control S/Q (not good, since this will scramble later output). Hope that helps! -- Gene "the end is in sight" Spafford The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332-0280 CSNet: Spaf @ GATech ARPA: Spaf%GATech.CSNet @ Relay.CS.NET uucp: ...!{akgua,decvax,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf
roy@phri.UUCP (Roy Smith) (12/19/85)
> The printing console you get with the 750 (probably a LA100-type) can't > take characters at the rate the controller is strapped for. > > Fixes in order of preference: > Setting the console to draft/letter mode > Change the kernel to [...] have a bigger console buffer > Configuring the terminal with the "setup" key to not use control S/Q Why not just change the baud rate from the default 2400 down to 1200? That's what we did on our 750/LA-120 combo (down with LA-100's!). The printer can keep up without ever having to send a C-S, and 1200 is plenty fast enough anyway (in fact, if you make the console 300, that might be better because it disuades people from using it as a regular terminal). The only problem with this is that to change the console baud rate you have to move jumpers on the CPU backplane. Why DEC did this, I'll never know. BTW, on the 2 Vax installations I've supervised, the tech doing the install had the baud rate on the console set wrong and refused to believe that was the problem (they prefer to take the whole CPU apart). -- Roy Smith <allegra!phri!roy> System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/21/85)
I would prefer the system to flush all terminal port FIFOs when it comes fully up. Nothing in them can be any good anyway.
neil@man.psy.UUCP (Neil Todd @ UK.AC.MAN.CS.UX) (12/21/85)
In article <296@tikal.UUCP> you write: >Configuration: > VAX 11/750 > 4.1bsd Unix > 1 UDA50 disk controller, 2 RA81's > TU78 tape controller with 2 slaves > >Problem messages: > uda0: soft error, disk transfer error, unit 1 > uda0: soft error, SDI error, unit 1, event 0353 > > uda0: soft error, disk transfer error, unit 0 > uda0: soft error, SDI error, unit 0, event 0353 > > The SDI error message is always paired with the disk > transfer error message. > Notice no sector numbers. The messages occur sometimes 3 > hours apart, sometimes 24 hours apart, but usually not in > bunches. Typically, only 2-6 occur per day on a fairly > active system. > >Other: > Out of 3 identical installations, we get these error > messages on only one. The system doesn't crash when > these messages occur, but peculiar things happen (like > the autoboot sequence pausing (until a key is hit) > just before /etc/rc is run (no, we don't have rc prompt > for anything)). > >Anybody have experiences with these errors? >We have had DEC field service on this for weeks (months?), >and they don't seem to know what to do. We have no manuals on >the UDA50 (the computers are installed at customer sites), >so if anybody out there does, maybe telling us what >event 0353 is would be of some help(?) > >Thanks in advance. The event 353 stuff is to do with "drive detected errors" to quote the MSCP Basic disk functions Manual. I think that it turns up because the disk has got a bad block. In spite of what many people think neither the standard driver nor the hardware will revector the bad block. The RIACS uda driver will fix this problem. Neil Todd JANET :- neil@uk.ac.man.cs.ux UUCP :- ...!mcvax!ukc!man.cs.ux!neil ARPA :- neil%uk.ac.man.cs.ux@ucl.cs P.S. My DEC man didn't have a clue either.
eichelbe@nadc.arpa (12/25/85)
>In article <296@tikal.UUCP> you write: >>Configuration: >> VAX 11/750 >> 4.1bsd Unix >> 1 UDA50 disk controller, 2 RA81's >> TU78 tape controller with 2 slaves >> >>Problem messages: >> uda0: soft error, disk transfer error, unit 1 >> uda0: soft error, SDI error, unit 1, event 0353 Neil Todd <neil%man.psy.uucp@BRL.ARPA> writes: >The RIACS uda driver will fix this problem. Under 4.1 BSD? (NOTE THE 1). Does this driver work under 4.1 BSD? Not without some work. I've looked at the code. Just to start off, the "#include" files are different. If you have some magic for making the RIACS UDA50/RA?? driver work for 4.1 (NOTE THE 1) BSD I'd love to hear about it since I want to use UDA50/RA81's under 4.1 BSD. Jon Eichelberger eichelbe@NADC.ARPA