jbm@uncle.UUCP (John B. Milton) (07/31/89)
In article <850@flatline.UUCP> erict@flatline.UUCP (J. Eric Townsend) writes: >In article <1989Jul26.174524.21833@eci386.uucp> clewis@eci386.UUCP (Chris Lewis) writes: >>Somebody else wrote: [ error ] > >According to an AT&T tech who came out and replaced the HD in >my 3b1 (while it was under warranty), this is something that could >be fixed from floppy-unix, if AT&T had bothered to ship a program >that could do the super-low level format needed to test the hard drive. Hmmm. Far too general a statement to be entirely corrrect. >This is where I start to lose understanding of the subject, so >I only *think* I'm correct. Well, ok >There are two levels of formatting: The normal level, what "we" >use, merely erases the disk, and sets up the base for the file system. I can tell you've been too close to DOS land. >There is a lower level format that actually writes the 0 block (or >wherever the "what am I" information for the drive is stored). This >"what am I" information is what the 3b1 uses to format the hard drive. >Currently, there is no way to do a "you are a X" format on a drive. >(I've done this on IBM PClones, however. :-( Well, there are several levels to formatting the hard disk drive. What the diag disk does is a "low level format", that is, it sends a format track command to the WD1010 hard disk controller chip. Oh, yeah, then how does it remeber the old bad block table when you format a drive twice. Easy, before formatting it checks to see if the disk about to be formatted is a UNIXpc disk with a good VHB. If so, it reads the existing BBT, formats the disk, then re-writes the old BBT when the format is complete. The reason is obvious: once a bad spot, always a bad spot. I, like most people would not like to give a bad spot a second chance. If you want to dump the old BBT, you have to trash the VHB to make the diag disk think it's a new, raw disk. The low level format re-writes the ENTIRE track from index to index, gaps, headers, data, everything. There is a "sort of" lower level format which involves warping the format by changing the gap sizes. This is done to AVOID bad spots, is very time consuming, and not very reliable. Some of the PC "low level" format programs can do this. What the DOS format command does is something completely different. It just fills the disk with a pattern (FD I think). This is all it can do because DOS has no way to tell just what kind of hard disk controller (chip) is down there. John -- John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu (614) h:294-4823, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!
jcm@mtunb.ATT.COM (was-John McMillan) (08/03/89)
In article <580@uncle.UUCP> jbm@uncle.UUCP (John B. Milton) writes: : > ... If so, it reads the existing BBT, formats the disk, >then re-writes the old BBT when the format is complete. The reason is obvious: >once a bad spot, always a bad spot. I, like most people would not like to give ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >a bad spot a second chance. If you want to dump the old BBT, you have to trash >the VHB to make the diag disk think it's a new, raw disk. The low level format >re-writes the ENTIRE track from index to index, gaps, headers, data, everything. Be more charitable, John: Redemption IS possible, for some... but it takes love and attention. Brother McMillan has spent long nights with several disks, and twice the WORD has driven satan outta that disk! We've been here before... o yes, children, this HAS BEEN SAID BEFORE! Formatting a disk puts META information on the disk: it's like spraying the lines in the parking lot. Without these lines, the disk cannot identify where the data is to be placed/grabbed. Each disk sector has a leading-edge gap, mark, identifier, gap/mark, user data, and final gap (sometimes w/ CRC). (OK -- I'm faking it, I haven't looked at the code for years!-) Reading/Writing requires finding the sector identifier and precisely dribbling/sucking-up the data after hitting the internal mark. (Nice technical terms, eh?) A bad block, then, is one which has: Proven itself to be unreadable, or unreliably readable. Types of BAD BLOCKS: + SOME blocks are simply a victim of poor penmenship: If a block is badly written, it may be unreadable. And bad vibrations -- not to mention bad karma -- or power glitches might cause bad writes. (Likewise, some RETRIES may indicate NOT a bad block, but an Anomaly occuring during the READ cycle.) ++ If the write error is in the USER DATA field, the block may be recoverable by performing a FULL-BLOCK write: this will overlay the badly written bits without trying a read first. (If you write only 100 bytes, the other bytes have to be READ first.) ++ However, if the META information is corrupted -- by overwriting or by a marginal write during formatting -- only a re-format of that sector (track) can reclaim the block. + OK: there are also MEDIA defects. And THESE are the BAD BLOCKS which John was referring to, the ones we presume to be beyond salvation. When my 67 MB drive began having MAJOR problems with bad blocks in the SWAP -- I developed 120+ BAD BLOCKS over two weeks, and some odd messages from the diagnostics/surface check code -- I backed up everything. Feature -- this took only 3 days because CPIO was failing to verify dump after dump. Then I ZERO'd the Bad block list, and reformatted. And now, I have NO BAD BLOCKS. And there's been not a single read error in the subsequent 4 weeks. So I say, John: Bad Block lists aren't holy. There are varying reasons why Blocks are entered. And good reasons for considering a reformat -- 'though the manufacturers list of defects *MAY* be worth copying in. john mcmillan -- att!mtunb!jcm -- speaking for hizzelf, only PS: In a recent power hit in Lincroft, several 3B1 disks expired. Oddly, none of us with Spike/Noise suppression units were hurt. Then there's the fellah who hadn't put his suppression unit in service yet... sad fellah. Don't you be sad: use line conditioning.
thad@cup.portal.com (Thad P Floryan) (08/05/89)
John McMillan concludes one of his recent postings with: " PS: In a recent power hit in Lincroft, several 3B1 disks expired. Oddly, none of us with Spike/Noise suppression units were hurt. Then there's the fellah who hadn't put his suppression unit in service yet... sad fellah. Don't you be sad: use line conditioning. " Sage advice. Prior to installing a UPS _AND_ a line-conditioner on every system here, I could expect several failures a week (on any of Amiga, UNIXPC, several homebrews, etc.). Even turning on a flourescent room light or turning off a modem while writing to a floppy would trash the disk; now I can operate drill motors, etc. on the same line circuit with impunity ... ZERO errors for over 3-1/2 years now, and ALL my systems are operated 24 hours/day, 7 days/wk. In my quest for 100% system reliability, I rented a line monitor for 30 days and let it record everything on the AC power. What it saw almost made me poop my pants ... literally. 2000V spikes, hash, RF, etc etc etc even lossage of a cycle (of the 60 Hz) now and then (and this was NOT during the "normal" power outages for this area). The types of crap one finds on the AC power line are caused by any/all of: air conditioners, refrigerators, flourescent lamps, any other inductive or capacitive loads (modems, printers, fans, etc.), thunderstorm activity ANYWHERE near your power grid, hospitals and medical equipment, construction activity (esp. the ol' backhoe digging up power lines), air pollution and acid rain, animals "playing" around power lines/transformers (and this includes your neighbors' kids with their kites), and anything else that plugs into the AC power line. If your livelihood depends on reliable system operation, you're living on borrowed time (or walking the edge) if you don't have at least a good spike and surge/transient suppressor between the wall outlet and your system(s). I even have special modem protectors (designed/built by GTE) to work in conjunction with the "Primary Phone Line Protector" (the "box" where the phone service enters your site) installed on every line by one's local TelCo. Yeah, I sometimes joke about operating my computers under candlelight during a power failure (when the UPS is powering everything), but the peace of mind is definitely worth it. The only thing I haven't got working with the 1200 Watt units yet is getting the signals from the UPS' DB-9 connector into the UNIXPC to carry on the dialogue between the UPS and the computer as has been done with the Convergent Tech Miniframes (these UPS systems were DESIGNED for use with the Miniframe under contract to SAFE in Arizona). If you want to contact them for the address of a dealer near you: SAFE Power Systems, Inc. 528 West 21st Street Tempe, AZ 85282 602/894-6864 "PC" magazine had a good article several years ago about surge protectors; some they tested even AMPLIFIED the spikes to yet higher voltages! At least you know that nothing over approx. 4,000 volts will come into your system ... 4000V is the flashover point between the two prongs (hot and neutral) on your standard USA AC power plug. Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]
jbm@uncle.UUCP (John B. Milton) (08/07/89)
In article <1583@mtunb.ATT.COM> jcm@mtunb.UUCP (was-John McMillan) writes: >In article <580@uncle.UUCP> jbm@uncle.UUCP (John B. Milton) writes: >: >> ... If so, it reads the existing BBT, formats the disk, >>then re-writes the old BBT when the format is complete. The reason is obvious: >>once a bad spot, always a bad spot. I, like most people would not like to give > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>a bad spot a second chance. If you want to dump the old BBT, you have to trash >>the VHB to make the diag disk think it's a new, raw disk. The low level format >>re-writes the ENTIRE track from index to index, gaps, headers, data, everything. ... > + OK: there are also MEDIA defects. And THESE are the BAD BLOCKS > which John was referring to, the ones we presume to be beyond > salvation. I really did mean what I said. Perhaps I should have been more specific. What I was referring to was places on the disk that are physically not responding correctly. Many, many other things can go wrong that do not mesh with the "bad data read, this must be a bad block" idea. The format routine on the diagnostics disk was written for the user. It was written to find pre-existing bad spots that are expected to be there. The diagnostics assume that the hardware is functioning correctly. Remember, if it acts weird, you're supposed to call AT&T service, right. My original comment was also made assuming your system is functioning properly (or is now). So is it time for a HwNote on what's REALLY on the disk and how disks work? John -- John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu (614) h:294-4823, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!