buck@siswat.UUCP (A. Lester Buck) (08/02/89)
Some time ago I asked the net about a weird problem I was having with the following setup: Microport System V/AT 2.4.0L Adaptec 2372 disk controller Seagate 277R - disk 0 Mitsubishi 535 - disk 1 The Mitsubishi 535 had a few bad spots, they were found by the Microport surface scan, they showed up on "showbad 1", but they were always ignored when using a filesystem on the disk, generating irritating console error messages and randomly losing data. Unit 0 works perfectly, correctly mapping out its bad sectors. Only Bill Vajk (learn@igloo) responded to tell me he had run across the same problem and was discouraged enough to consider moving back to 2.3, since he also needed two drives to handle news. I finally figured out how to work around this problem. The "obvious" solution was to make a file and use fsdb to put all the bad sectors in one file. Then they are off the free list and never heard from again (grep -v badfile on backups, etc.). Unfortunately, this just wasn't working and I was still getting the "Bad block" messages on the console on a regular basis. Here is the full workaround. The Microport 2.4 disk driver appartently does full track caching on any access to a track, so the message appears for the one bad sector if any of that track's sectors are read. Also, once that track is in the track cache, further reads from that track might NOT give the console message if they do not access the one bad sector. So if you write a simple program to find when the console message shows up, to really test it you need to seek away from the suspect sector, read anything to flush the track cache, then seek back, for each of the sectors on the suspect track. You should find every sector on the track with a bad sector gives a console message, even though most of the time the data is read correctly. All of this assumes the bad block sniffer program uses the raw device to bypass the buffer cache and the bad block table (which should be active for the block device, but of course isn't for Unit 1). I finally used the program at the end of this posting to make a single indirect block with the list of blocks on the track with the defects. My disk is a Mitsubishi 535 RLL with 977 tracks, 5 heads, 26 sectors/track, and /dev/rdsk/1s2 is a partition starting at cylinder 504. Before running this, I mount a newly mkfs'ed filesystem, touch .badtracks in the root directory, umount the filesystem, start fsdb, find .badtracks inode with 2i.fd, change a10=2000 (arbitrary), change sz=1024*blocks I am locking out, run the program below (writebad.c), and then run fsck to clean up the free list. Here, a block is a logical block == 1 K, so there are 13 blocks/track on an RLL disk. Fsck should complain about (number of blocks locked out from bad tracks) + (1 block from indirect block 2000) as duplicated on free list and remove them. Then the next fsck run will be clean and the disk is now perfectly normal. The file .badtracks should have all permissions missing. I have not gotten a console message about bad blocks in the couple of months since I fixed my filesystem this way. If you trust your calculations, you can just adjust this program to map out your own defects. Otherwise, you can write a simple bad block sniffer program for the raw device, remembering to seek away after reading every sector. I have no idea whether this problem needs an RLL disk or whatever to show up, but it might explain some of the anomalous behavior that has been recently reported in this newsgroup with two ST-4096 drives. A. Lester Buck ...!texbell!moray!siswat!buck /* write indirect block with list of bad blocks */ /* this for a partition that starts at cyl 504 */ #include <fcntl.h> #define DIRBLOCK (2000L) /* arbitrary choice for indirect block */ unsigned long badblocks[1024/sizeof(long)] = { 2743, 2744, 2745, 2746, 2747, 2748, 2749, 2750, 2751, 2752, 2753, 2754, 2755, 8489, 8490, 8491, 8492, 8493, 8494, 8495, 8496, 8497, 8498, 8499, 8500, 8501, /* for example, the following defect is at cyl 806, head 3 */ 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676, 19677, 19678, 19679, 19680, 19681, #if 0 /* my partition was auto-resized to end before this defect */ 30004, 30005, 30006, 30007, 30008, 30009, 30010, 30011, 30012, 30013, 30014, 30015, 30016 #endif }; main() { int fd, nwrite; if ((fd = open("/dev/rdsk/1s2", O_WRONLY)) == -1) { perror("open /dev/rdsk/1s2 failed"); exit(1); } if (lseek(fd, DIRBLOCK*1024L, 0) == -1) { perror("seek to block failed"); exit(1); } if ((nwrite = write(fd, badblocks, sizeof(badblocks))) == -1) { perror("write badblocks failed"); exit(1); } printf("wrote %d bytes\n", nwrite); } -- A. Lester Buck ...!texbell!moray!siswat!buck
debra@alice.UUCP (Paul De Bra) (08/02/89)
In article <433@siswat.UUCP> buck@siswat.UUCP (A. Lester Buck) writes: }Some time ago I asked the net about a weird problem I was having with }the following setup: } } Microport System V/AT 2.4.0L } Adaptec 2372 disk controller } Seagate 277R - disk 0 } Mitsubishi 535 - disk 1 } [ most of explanation about bad sectors deleted ] } }The Microport 2.4 disk driver appartently does full track caching on any }access to a track, so the message appears for the one bad sector if }any of that track's sectors are read. Also, once that track is in }the track cache, further reads from that track might NOT give the }console message if they do not access the one bad sector... Are you sure about this? This would make uPort the first Unix System i've come accross that would do track buffering. It may be possible (I don't know the 2372) that it's the Adaptec controller which does the full track buffering, and it could signal the kernel that there was an I/O error if any of the sectors on a track is bad. There are several controllers on the market that do full track buffering. I believe that with Xenix (and i think also with AT&T wVr3.2) whole tracks are mapped out if they contain a bad sector. Your problem indicates that this is indeed a better approach, and well worth losing a few extra sectors. Paul. -- ------------------------------------------------------ |debra@research.att.com | uunet!research!debra | ------------------------------------------------------
plocher%sally@Sun.COM (John Plocher) (08/04/89)
+---- In <9728@alice.UUCP> paul debra writes: | +---- In <433@siswat.UUCP> A. Lester Buck writes: | | The Microport 2.4 disk driver appartently does full track caching on any | | access to a track ... | +---- | Are you sure about this? This would make uPort the first Unix System i've | come accross that would do track buffering. +--- Yup, Microport's hard disk driver does do full track caching. It has done so for several years (>2). The floppy driver also does track caching and supports 1:1 interleave as the default (AT&T uses 4:1) -John Plocher