[comp.sys.ibm.pc] Occasional Posting: More Disk Drive Technical Details

pete@Octopus.COM (Pete Holzmann) (10/25/89)
Here's an edited copy of an article I wrote in May 1988, describing why
the use of an RLL controller will not cause physical damage to the drive.
Someone who repaired drives for a living had been claiming that RLL 
formatting would cause excessive seek errors...

    NOTE: If you find typos or errors in this article, please email your
	comments to me. I'll keep this file up to date and repost as needed.

>As the number of
>marginal sectors on the disk increases, so does the number of seeks.
>Unfortunately, this heats up the arm assembly until (on some drives,
>at least) it starts to warp.  This causes more seeks, which causes more
>heat, which ...  Naturally, heat is most likely to be a problem with
>low end drives, which are more likely to have problems running RLL.
>If the problem is allowed to progress far enough, even reformatting the
>drive for MFM won't help, as the head positioning system won't work
>reliably any more.

If you understood how seeks work on a disk drive, you wouldn't believe this
for a minute. Here goes with more details about how disk drives work!

First, the simple intuitive argument: Can too many seeks due to bad sectors
	or poor formatting cause the heads and/or positioner to fail due
	to heat problems? NO! A typical qualification test for a drive is
	a MILLION random seeks. This may not be performed on every drive
	in the production line, but drives should be able to seek continuously,
	just about forever, without ANY seek errors. Too many seeks in a row
	should never cause trouble!

Now, let's talk about how things happen on a disk drive:

In our last episode ('RLL Technical Details'), you'll recall that the
beginning of a sector is formatted with a bunch of zero bytes and a thing
called an address mark; this helps the disk-read electronics syncronize
with the data on the disk. Now it is time to draw a more detailed picture
of what the beginning of a sector looks like. I'll draw it top-to-bottom,
with the beginning at the top:

The following info is transcribed from the Maxtor OEM manual. Drive 
formatting is standard across the industry; any variations from the
following description are very minor in nature (# sync bytes may change,
# bits for head may be different, etc). Note that for MFM/RLL (non ESDI/
SCSI) drives, the format is completely up to the controller; the drive
knows nothing about formats!

If you are skimming, just remember the major field delineations (ID FIELD
and DATA FIELD). Those are what is really important in the following
discussion.

Field		 # byte Value		Comments

Sync		   13	0		For syncing the read-electronics

	ID FIELD:

ID Address Mark    2	A1 FE		Clock bit missing from 6th bit of
					the A1 in 1,3 RLL format (MFM). I'm
					not sure what bit is munged in 2,7
					RLL; the key is that the first byte
					does NOT follow the 'rules', the
					second byte flags that this info
					is the ID field.

Cylinder Low	    1	xx		Low 8 bits of current cylinder #

Cyl Hi/Head	    1	0ccchhhh	Hi 3 bits of cylinder, 4 bit head #

Sector #	    1	N		Sector # on this track

CRC/ECC 	  2-n	xxxx		CRC or ECC bytes (depends on format)

ID Trailer	    3	0		So write-head-turnoff-glitch doesn't
					mess up the CRC/ECC

	GAP

Sync for Data Fld  13	0		For syncing the read-electronics

	DATA FIELD

Data Address Mark   2	A1 F8		Same as ID AM, but different marker
					so we know that data is coming.

Sector Data	   N	xxxx		Actual sector data (512 bytes or
						whatever)

CRC/ECC 	  2-n	xxxx		CRC or ECC bytes (depends on fmt)

Data Trailer	   3	0		So write-head-turnoff-glitch doesn't
					mess up the CRC/ECC

	GAP

Extra inter-sector
space		   15	4E		Takes up room so sectors are evenly
					spaced


OK! Now you've got that table.

Here's how seeking, reading and writing occur at the lowest level:

SEEKING

The controller issues some commands (not discussed in detail here) that
cause the heads to move to the (hopefully) correct track. Once the drive
electronics report that the seek is physically complete, the controller
verifies this by looking for an ID field. If the ID field shows that we
are in the right place, we're done!

READING

The controller reads until it finds the ID field for the sector to be
read, or until N revolutions of the disk and it times out with an error.

Once the correct ID field is found, the Data field is read. If there is
any problem, including if the data field is not found right away [we don't
want to read the data field for a different sector!], then we go back to
looking for the ID field, retrying N times.

If there are any CRC/ECC errors, we retry.

WRITING

The controller reads until it finds the ID field for the sector to be
written, or until N revolutions of the disk and it times out with an error.

Once the correct ID field is found, the entire data field is written. The
controller simply waits long enough for most of the ID-to-Data GAP to
go by, then starts to write, beginning with zeros in the sync field, the
data A.M., the data itself, the CRC/ECC bytes, and the trailing zeros.
THE ID FIELD IS NEVER TOUCHED.

FORMATTING [By the way, this is 'low level' formatting for MSDOS people.
		The FORMAT command simply changes data in sectors. It does
		not write any ID fields.]

The controller waits until the 'Index' signal is seen, which indicates that
the heads are at a particular spot in the disk rotation (the 'Index', amazingly
enough!!!) Then it simply writes an entire track of data, including all
the gaps, ID fields and data fields for all sectors on the track. It doesn't
read or sync up with anything except to verify that the formatting info was
correctly written.

On some high-end drives, an entire surface is specially formatted with
information that helps the drive perform its physical seeks. This formatting
is performed at the factory and is not *supposed* to be changeable afterwards.
If the heads become physically misaligned to any great extent, this extra
surface needs to be reformatted, usually at the factory. LOW END DRIVES
DON'T HAVE THIS 'FEATURE'. It is used to make seeks faster, by the way.

Now for some other info:

BAD SECTOR CAUSES:

	- Physical defects on the disk surface. These can be avoided by
		simply not using bad areas on the disk.
	- Writing unreadable data (e.g. writing RLL format on a drive that
		isn't accurate enough). This can be fixed by writing good
		data. If a drive has not been physically damaged, and if
		the electronics are ok, then what used to work (physically)
		will still work now, even if you garbaged up the format
		in the meantime.
	- Writing when you shouldn't (e.g. controller firmware gone nutso)
	- Over time, thermal effects may cause the heads to shift slightly,
		which can cause new physical defects to be seen, and/or
		the old ID fields to be unreadable. Re-formatting will always
		correct this problem.
	- When a system is power-cycled, the power to the drive heads is
		'unbalanced' for a moment. A small amount of magnetic
		effect is transmitted to the drive surface. This is generally
		not enough to actually change the data on the drive, but
		it weakens the signal that will be read next time. The
		accumulation of this effect over time causes good data to
		become marginal; marginal data to become bad. Since this
		only happens at each on/off cycle of your system, it isn't
		a major factor in general. You can avoid it completely by
		parking your disks [moves heads to unused cylinder] before
		turning the system off. Rewriting the low-level format and
		data on all cylinders will completely eliminate any accumulated
		problems of this type.
	- Murphy. [No matter how completely we may understand a system,
		someday it will probably do something completely inexplicable!
		Always leave room for the twilight zone... ;-)]

CONCLUSIONS

Now that you know this, you'll understand a few things better:

- Except for drives with a separate seek-synchronizing surface as
	described above, there is no particular place on a disk surface
	that is non-formattable. If you lose the format on a disk for
	some reason, re-doing the low-level format will make the
	disk usable again. If you can't low-level format a drive,
	then either:

    - you have a problem somewhere else (firmware, cables, jumpers, etc)
	    (sure, that's a general statement; sorry!)
    - you have a drive with a special-surface that has been wiped out 
	    (and NOT because you used a controller with a different 
	    data format!) It is possible that operator and/or firmware
	    error caused the controller to write over something that 
	    should not be writeable.
    - you have a controller and/or BIOS that can't handle a 
	    non-formatted drive or incorrectly-formatted drive. I've 
	    heard rumors about this, and have seen examples. A 
	    different BIOS and/or controller may be able to fix the 
	    format for you; the drive would then become usable again. 
	    An example of this: DTK BIOS versions through 1/88
	    won't let you boot up DOS from a floppy if the 
	    drive isn't formatted. More recent versions of
	    the BIOS have this fixed. Someone mentioned
	    that some WD controllers can't talk to a drive that has
	    been previously formatted RLL. The same drive is fine
	    after reformatting with a non-Western-Digital MFM 
	    controller. The WD controller must have some kind of
	    firmware bug.
    - you have a physically damaged drive. If any sectors on
	    a disk surface can be written and read under any
	    circumstances, then that particular head and associated
	    electronics are ok.
    - Murphy again.

Pete

-- 
Peter Holzmann, Octopus Enterprises   |(if you're a techie Christian & are
19611 La Mar Ct., Cupertino, CA 95014 |interested in helping w/ the Great
UUCP: {hpda,pyramid}!octopus!pete     |Commission, email dsa-contact@octopus)
DSA office ans mach=408/996-7746;Work (SLP) voice=408/985-7400,FAX=408/985-0859