ktk@spam.istc.sri.com (Katy Kislitzin) (06/30/89)
QUICK VERSION: stand copy overwrote the d,e,f partitions of my scsi disk while trying to copy the 4.0.3 Upgrade miniunix into my swap partition. goto: DESCRIPTION OF BUG ================== THE LONG SOB STORY Last night I brought my 140M scsi disk into work to install 4.0.3 on it so I could use the refurbished 3/50 I just bought for home. The only useful scsi tape i could find had a scsi disk which was in use attached to it. That disk is running 3.5 and is attached to a 3/110. I hooked up my disk as sd2, got my 4.0 release tapes and did the installation onto the sd2 disk. Everything went *really* smoothly The only problem i had was that i wanted to copy the minunix into sd2b, the swap partition of the 140M disk. this way i could avoid touching the other disk. unfortunately, when i followed the 4.0.3 manual and used sd(,8,1) to refer to the second scsi disk, it failed. with hindsight, i realize that the first place that mentions doing that is the 4.0.3 manual and pre-4.0.3 releases problably don't support it. i didn't try with the 4.0.3 tape because "it didn't work before so it won't work now". ================== DESCRIPTION OF BUG After the 4.0 full install i went to do the 4.0.3 upgrade, the first step of which is copying in the miniroot. this is where the bug showed up. The first step, as usual, is to boot the copy program and copy mini-unix into the swap area (sd0b). So i executed the following sequence: b st(,,) Boot:st(,,2) From:st(,,3) To:sd(,,1) At this point the machine got busy for an inordinately long time. Eventually i aborted it, as i couldn't see *any* possible way for it to be taking this long and it didn't appear that the tape was moving. At that point i needed to reset the sun (L1-a k2) before i could continue, as the scsi bus was OTL. After that i tried again and the upgrade completed normally on sd2. HOWEVER, when i went to boot the problems with the sd0 disk partitions showed up. ================== CONTINUE SOB STORY When I came up single user mode, fscked the root partion and rebooted, I couldn't do anything, not even ls! I soon figured out this was because pub wasn't mounted and SunOS root is worthless! all I had was /bin/sh, and etc ( thank god it was a 3.X machine ). I tried fsck'ing the other partions and found that d, e, and f were missing their superblocks, although g was fine. This is where i got VERY unhappy. the machine i was using is not dumped because its owners decided they didn't need too. (ALWAYS DUMP!) I hadn't dumped it before i started because i KNEW i was only going to be using the b (swap) partition. (ALWAYS DUMP!) I didn't really care that I had wiped out pub and user, but I also wiped out one partition's worth of moderately valuable stuff. The machine was a clone of testbed machines, and a clone of the development machine, *except* not quite. What I believe happened is that the scsi controller got an error, probably from the tape which it passed onto stand copy. Unfortunately, copy ignored the error and kept writing out to disk. it never reached the end of the tape file so it just kept writing junk out to disk forever. Of course, the particularily nasty thing is that it also ignored the disk partitions and just kept writing. Needless to say, my boss does not look kindly towards me or anyone using company resources for personal projects right now ;-) The only good part is that people whose disk it is have all been very reasonable and the stuff on the disk a) needed to be put back in sync with the field machines anyway b) was not *crucial*, although it's loss will have some impact The final blow to the evening was getting home around 2 am and finding that my house had been broken into ;-( I have reported the bug to my sun sales rep and to the last person I talked to at the sun support number via e-mail. If someone could be kind enough to send me the bugs address... Also, if this has happened to anyone else, I would like to hear about it. --ktk@spam.istc.sri.com