sls@allegra.UUCP (11/11/83)
We recently brought up 4.2BSD on our 11/780. Our system only has one disk at present, an RP07. Yesterday, we started getting soft errors on our / and /usr filesystems, followed by hard errors (OPI, DVC, ECH, DCK). These errors (which were many) occurred over a period of 10 minutes. The errors ceased, and the system stayed up for about 30 minutes, and then crashed. / and /usr were totally scrogged, but our user filesystem escaped unscathed. DEC tested out the drive and found absolutely nothing wrong with it. I booted Unix from the distribution tape and restored /usr. We have had no errors of any kind since then. Has anyone running 4.2 had a similar problem? Susan Shaw BTL - MH allegra!sls
serge%ucbcory%berkeley@sri-unix.UUCP (11/15/83)
From: Serge Granik <serge%ucbcory@berkeley> We are currently running 4.2 BSD UNIX on 11/750. Both systems seem to crash about once a day, often more. The cause is often "panic: Hard I/O error in swap". Sometimes it's "panic: vgetu". Any ideas? serge@ucbcory
alt@aids-unix@sri-unix.UUCP (11/19/83)
From: Howard Alt <alt@aids-unix> The people at utexas-780 (ut-sally) had a problem like that when they first got the machine. It was the RP07. DEC had to run diags on it for about 8 hours before the problems surfaced. Howard.
guy%ucla-locus@sri-unix.UUCP (11/22/83)
From: Richard Guy <guy@ucla-locus> This is a 4.1 tale, but I suspect it relates well to your 4.2 problem, which I attribute to the swap code being overly sensitive to disk errors: Here at UCLA we're running some dozen 750's with a variant of 4.1bsd. Each system has a Fujitsu disk (160Mb or 450Mb), plus the inevitable RK07 disk. (there were ineveitable when we got the systems a year ago) For the most part, we avoid using the RK07's whenever possible, since the controller doesn't buffer data very well while waiting to grab the unibus. To attempt to deal with the problem, we recabled all the systems so that the RK07 is at the physical/logical front of the bus--means the bus is 20' longer now, sigh. (This helps because devices at the 'front' of the unibus have a slight edge over other devices 'farther' away, when it comes to bus arbitration) We finally ran out of swap space on the Fuji's, so we added two more swap partitions on the RK07's. To save time/effort, we enabled both partitions for each system, all in the same day. Within a week, each of our systems was crashing at least once a day with 'panic: hard i/o error in swap'. Turns out the RK07 just can't seem to deliver the goods when it has multiple swap partitions on the same spindle. We backed off to using only one RK07 partition, and our problems have been gone for 5 months now. A better solution would have been to beef up the code and have it retry at least once to get the data. A question for those running a lot of RK07's: How have they worked out for you? Our experience has been a minor disaster. The basic problem is described above; others had to do with pack unreliability--'DC' packs fall apart after three months, so we replaced most with 'EF' error free ones; they fall apart too, but it takes six months. (fall apart means new bad sectors start appearing once a week or more--real bad news if you're using it as a boot device!) On the positive side, DEC has been reasonably responsive about replacing the packs. (all under maintenance, of course) In summary, if we don't use the things, they don't break. (very often) As soon as they get any significant usage...they die. richard