john@jwt.UUCP (John Temples) (06/07/91)
In article <767@dumbcat.sf.ca.us> marc@dumbcat.sf.ca.us (Marco S Hyman) writes: >I haven't found this in TFM yet -- perhaps the net can help. Given an error >message that says something like "SCSI absolute sector 1234 on drive 1 is bad" Which of the 386 UNIXes do automatic bad sector mapping? I know that ESIX does, and ISC 2.0.2 does not. What about the newer ISCs, AT&T, SCO, and the various SVR4 implementations? This is a *really* nice feature, in my opinion. -- John W. Temples -- john@jwt.UUCP (uunet!jwt!john)
cpcahil@virtech.uucp (Conor P. Cahill) (06/09/91)
john@jwt.UUCP (John Temples) writes: >Which of the 386 UNIXes do automatic bad sector mapping? I know that >ESIX does, and ISC 2.0.2 does not. What about the newer ISCs, AT&T, >SCO, and the various SVR4 implementations? This is a *really* nice >feature, in my opinion. It is not a nice feature at all. At one time we were running Bell Technologies 3.1 or so and it had automatic bad sector mapping. This caused great headaches when a directory magically appeared with bogus data in the middle, or an executable all of a sudden has a block of zeros in the middle of it. Our hardware at the time (also Bell Tech) had a problem with static electricity that would cause a series of sectors to appear bad whenever the machine was touched. This torched our system several times and the only way to fix it was a full restore (trying to patch around it is almost impossible when /usr/bin or /bin get blown away). Yes, it sould be easier to map blocks, but IMHO automatic mapping is not the answer. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
john@jwt.UUCP (John Temples) (06/10/91)
In article <1991Jun09.133624.2806@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes: >>Which of the 386 UNIXes do automatic bad sector mapping? >It is not a nice feature at all. At one time we were running Bell >Technologies 3.1 or so and it had automatic bad sector mapping. >This caused great headaches when a directory magically appeared with >bogus data in the middle, or an executable all of a sudden has a >block of zeros in the middle of it. The ESIX implementation catches errors while they're still "soft," i.e., the error is recoverable. So remapping occurs with no data loss, as long as the first time a sector has an error it isn't a hard error. >Our hardware at the time (also Bell Tech) had a problem with static >electricity that would cause a series of sectors to appear bad whenever >the machine was touched. You're saying that since the feature didn't handle an unusual hardware problem, it's bad? I think I'm more concerned with it handling the more likely hardware failures well. My experience with ESIX is that all errors have been caught while still soft -- my system kept right on running without a hitch. With ISC -- boom, I'm told I've got bad sectors; then the backup/mkpart/fsck/restore headache begins. I've never spent one second of my time handling bad blocks under ESIX; under ISC, hours have been wasted. >Yes, it sould be easier to map blocks, but IMHO automatic mapping is >not the answer. What if you had the option of having the driver report problems, and you had to give your OK for it to proceed with remapping? Or better yet, you could select between that mode and fully automatic mode. -- John W. Temples -- john@jwt.UUCP (uunet!jwt!john)
chip@chinacat.unicom.com (Chip Rosenthal) (06/11/91)
In article <1991Jun10.025527.10161@jwt.UUCP> john@jwt.UUCP (John Temples) writes: >The ESIX implementation catches errors while they're still "soft," >i.e., the error is recoverable. So remapping occurs with no data loss, >as long as the first time a sector has an error it isn't a hard error. Sorry...I still think Conor is right. On-the-fly bad sector mapping is the machine being too damn smart for its own good. If you've setup the system from the beginning with the disk manufacturer's flaw map, you should have no call for this `feature'. If you are looking for this `feature' as a way of eeking out another meg or two of storage space (i.e. don't map out marginal areas, just wait for them to fail), I think you are being pound foolish. (I wouldn't run non-RLL certified hard disks on RLL controllers either when that was in vogue. My disk data is too valuable to screw around with.) If you setup your disk correctly right from the start, you shouldn't be seeing bad sectors, and this automatic mapping becomes a rarely used feature. And when a sector goes bad, the *last* thing I want is for the machine to automagically patch around it. I want bells and lights and claxon screaming - because any time I've had a sector go bad which wasn't on the flaw map it's meant big, big trouble is on the way. I'd just as soon do my reformat/reload now, rather than waiting for a couple of weeks for the entire disk to crap out. -- Chip Rosenthal <chip@chinacat.Unicom.COM> | Don't play that Unicom Systems Development 512-482-8260 | loud, Mr. Collins.
rcd@ico.isc.com (Dick Dunn) (06/11/91)
john@jwt.UUCP (John Temples) writes about automatic remapping: > The ESIX implementation catches errors while they're still "soft," > i.e., the error is recoverable. So remapping occurs with no data loss, > as long as the first time a sector has an error it isn't a hard error. I don't believe this is a common failure characteristic. Assuming that (1) you're running the drive in-spec, and (2) you've mapped out all the bad sectors determined by the drive manufacturer [N.B.: This is *NOT* the same as bad sectors found by running a r/w test], you shouldn't expect soft failures because you're not using any marginal sectors. A drive which is about to Bite the Big One may show a few soft errors before the disaster happens, but that's an omen that Something Bad Is About to Happen, so you want to know about it right away. For example, a tiny particle can get loose somehow. If it's just the right size to get under the head, it'll take a tiny ding out of the coating on a platter...and there's a good chance it'll be small enough to leave you with a soft error. However, you now have at least *two* tiny particles cruising around, possibly many more (the original and whatever got dug up). You can see how that one degenerates. It's only one hypothetical situation; the point is that if you start out using only the good sectors of a good disk and run in-spec, the sorts of things that can go wrong to produce soft errors are almost always (by that I mean something > 90%) precursors to a disastrous failure. If you run out-of-spec (e.g., non-RLL drives on an RLL controller), you're much more likely to see soft errors that stay soft. -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Simpler is better.
cpcahil@virtech.uucp (Conor P. Cahill) (06/13/91)
john@jwt.UUCP (John Temples) writes: >What if you had the option of having the driver report problems, and >you had to give your OK for it to proceed with remapping? Or better >yet, you could select between that mode and fully automatic mode. I would say that the vendor is spending too much time working on a feature that is of little use instead of spending that time working on bug fixes or performance enhancments. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
bill@bilver.uucp (Bill Vermillion) (06/16/91)
In article <1991Jun10.230223.10316@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >john@jwt.UUCP (John Temples) writes about automatic remapping: >> The ESIX implementation catches errors while they're still "soft," >> i.e., the error is recoverable. So remapping occurs with no data loss, >> as long as the first time a sector has an error it isn't a hard error. >I don't believe this is a common failure characteristic. Assuming that >(1) you're running the drive in-spec, and (2) you've mapped out all the bad >sectors determined by the drive manufacturer [N.B.: This is *NOT* the same >as bad sectors found by running a r/w test], you shouldn't expect soft >failures because you're not using any marginal sectors. In the ESDI world (John's running ESDI on his ESIX as I am) the mapping of the sectors from the manufacturers list is done automatically - it is read from the defect list of the supplied drive. And on big drives - no one is going to type in a couple of hundred defects willing or perhaps accurately. >For example, a tiny particle can get loose somehow. If it's just the right >size to get under the head, it'll take a tiny ding out of the coating on a >platter...and there's a good chance it'll be small enough to leave you with >a soft error. However, you now have at least *two* tiny particles cruising >around, possibly many more (the original and whatever got dug up). You can >see how that one degenerates. It's only one hypothetical situation; the >point is that if you start out using only the good sectors of a good disk >and run in-spec, the sorts of things that can go wrong to produce soft >errors are almost always (by that I mean something > 90%) precursors to >a disastrous failure. Your scenario would point to a drive that has not long to live. Anytime you have "particles" inside the drive you are going to loose that drive in a short time. At 3600 rpm it won't take long to trash that drive. The ESIX system tell you when it has recovered the sector and what sector it was. It uses ECC to recover from the hard error. That's why ECC is used in the first place - whether it is on hard drives or tape drive. Anytime you get an error that has to be corrected with ECC and you DON"T block out the problem area you are asking for trouble. I have a 660 meg ESDI that had about 300 bad sectors (I got it for about $1000 off because it was just over the limit for that drive). I have had about 3 instances of ESIX remapping a bad sector in the 10 months I have had this current drive running. They occured in the first 3 months of use and I have not had any since. Remember, these are only remapped when a hard error occurs and ECC is used for recovery. The system has been running 24 hours per day and usually runs from 20 to 40 Megs a day through the system as a news node. If there were problems with their system I feel that I should have found it by now. >If you run out-of-spec (e.g., non-RLL drives on an RLL controller), you're >much more likely to see soft errors that stay soft. Any one who does that gets exactly what they deserve, IMO. -- Bill Vermillion - UUCP: ...!tarpit!bilver!bill : bill@bilver.UUCP