[comp.sys.ibm.pc] Occasional Posting of COMPLETE RLL Technical Details

pete@Octopus.COM (Pete Holzmann) (10/25/89)
REPOST from May 1988 discussion. With thanks to Chris Torek for some of the
        language simplifications used. If you see typos or errors, please
	email them to me. I'll keep this file up to date, and repost as
	necessary. Thanks also to a few net-people who culled the originals
	from old archives for me.

NOTE: Please don't send me mail asking if drive xxx or controller yyy is
any good. I don't have time to answer all those requests, especially this
year (I'm temporarily a full-time employee, rather than a consultant!)
A follow-on posting contains the beginnings of a list of good-for-RLL
drives. Perhaps somebody else can take input from net-people and keep a
running survey.

If you read all the way through this, you will (hopefully) understand WHY
RLL works/doesn't work depending on the configuration you set up. You will
also understand WHY many of the horror stories applied to RLL are almost
certainly mis-applied.

I guess I should mention why you should believe me: Besides the fact that
I'm just a nice guy, I've worked on the microcode for disk drive test
equipment. In order to do a good drive tester, you've gotta understand the
low level guts of these things! 

I. How is data stored on a disk drive? What can go wrong at high densities?

Data is stored as magnetic flux reversals (think of it as wiggles, + to - 
and - to +). The polarity of the magnetic flux doesn't mean a thing. It is 
the timing of the flux reversal wiggles (i.e. their presence or absence 
when expected) that is used to encode data.

When you stick flux changes (data and/or clock) onto a disk surface,
they tend to move slightly from their desired position, due to imperfections
in the disk surface, small head position errors, etc. The better the drive
and controller, the more accurately the wiggles land where they were supposed 
to; they *always* wind up a little out of place in any case.

Unfortunately, flux change wiggles (like magnets) can sense each other when
they are placed close to each other.  If you put too many of them in a
row, they interfere with each other, and get shifted right out of where 
they were supposed to be:

    you wanted 1111:
    | wiggle | wiggle | wiggle | wiggle |
	0	 1	  2	   3

    but you got 11?0:
    | wiggle |   wiggl|     wig|le     w|ggle
	0	 1	  2	   3

The darned things are scared of each other!  We have to make sure that
we always leave some space between the wiggles, or they will get scared
and wiggle away :-).  We will represent this minimum wiggle-space
with | marks from now on. (Note that the wiggles can see through the marks
and will get scared if they are too close to each other; | mark or not.)

On the other hand, if you put them too far apart, the controller can't keep
track of when the next wiggle should arrive, as if it were counting sheep
and the sheep stopped jumping:    

    you wanted 11000001:
    | w | w |	|   |	|   |	| w |
      0   1   2   3   4   5   6   7

    but you got 11000.??oops
    | w | w |	|   |	|   |	| w |
      0   1   2   3   4   5  ... now where was I?

So the idea is that we have to put the wiggles close enough together so
that the controller does not lose sync, but far enough apart so that they
do not interfere with each other.

The easiest way to do this is the old IBM 8 inch floppy 'FM' format.
Using 'w' for wiggle, ' ' for none, we simply encode as follows:

        Data bit        Wiggles
           0            'w '
           1            'ww'

Here's some examples:

Message Data:    1    	 0    	 0    	 0       0       0   
Disk Data:     | w | w | w |   | w |   | w |   | w |   | w |   |...

Message Data:	1        1       1       1       1       1           
Disk Data:     | w | w | w | w | w | w | w | w | w | w | w | w |...

Message Data:	1    	 0       0       1       1       0
Disk Data:     | w | w | w |   | w |   | w | w | w | w | w |   |...


We must use a slow clock (200 ns) in order to allow sequential wiggles. 
Using this scheme, we get all of 128K on a huge floppy disk. Must be 
something better out there...


II. How does MFM work?

Well, MFM is just Modified FM. Rather than having a clock wiggle for every
data bit, we only use the clock wiggle when necessary to keep the controller
sync circuit (the phase-locked-loop) working correctly.

We still have only one data bit every two wiggle-times, but we can make a
wiggle-time half as long, since we now will guarantee a '0' between every
wiggle. Here's how it works:

        'w' means a wiggle encoding a '1' data bit,
        'C' means a wiggle required to keep the clock running because
                we can't go too long without a clock.

The code: data 1 always becomes ' w'
	  data 0 becomes '  ' if preceeded by a 1
	  data 0 becomes 'C ' if preceeded by a 0

Here's some examples, with a period marking the halfway point for each set
        of two clock periods:

Message Data:	1   0	0   0	0   0	0   0	0   0	0   0
Disk Data:     | .w| . |C. |C. |C. |C. |C. |C. |C. |C. |C. |C. |...

Message Data:	1   1	1   1	1   1	1   1	1   1	1   1
Disk Data:     | .w| .w| .w| .w| .w| .w| .w| .w| .w| .w| .w| .w|...

Message Data:	1   0	0   1	1   0	1   0	0   1	0
Disk Data:     | .w| . |C. | .w| .w| . | .w| . |C. | .w| . |  


The wiggles are still not too close---always at least one full '|  |' space
from wiggle to wiggle, and sometimes up to two spaces (just as with FM). But
now we are getting one data bit for each '|  |' time! We did this by doubling
the clock speed without increasing the data density. Notice that the wiggles
must be placed more accurately within their '|  |' windows. This is
important. 

It is natural to ask at this point, "what is the difference between a long
string of ones (11111) and a long string of zeros (00000)?" Here's how we
deal with that: there is a special data sequence encoded at the beginning of 
each sector, with special hardware to detect it: First, there is a long 
string of zeros; a hardware 'zero detector' is enabled to look for it. At 
this point, it *could* as easily find a string of ones as a string of zeros, 
since they are identical when taken out of context. Second, a special byte 
is encoded that violates the rules: an 'A1' byte is written, with a clock 
missing in bit 6 (one of the sequential zero bits.) The special byte is 
called the Address Mark. If zeros followed by an Address Mark are found, 
then the PLL (phase locked loop) is synchronized and data can be read.


III. What is RLL? What does the '2,7' in '2,7 RLL' mean?

RLL means Run Length Limited. The 'Limits' in disk drive RLL refer to the
minimum and maximum clock periods between flux reversal wiggles. '2,7' means 
a minimum of 2, and a maximum of 7. A minimum of zero would mean that wiggles
can occur in every clock period. Thus, '2,7' means that flux reversals occur 
at least every 8th clock period (7 periods without a wiggle), but no more 
often than every third clock.

MFM is simply 1,3 RLL encoding, with a basic clock period of 100 nsec (i.e.
200 nsec per data bit, or 5 MBit/second.)

What we talk about when we say 'RLL' is "2,7 RLL". The 2,7 RLL code is 
simply an incremental change from the MFM code. The complete code with
some examples follows:

	Data		2,7 RLL Code ('1' is a wiggle, '0' is none)

	1		00
	01		0001
	10		0100
	11		1000
	000		100100
	001		001000
	010		000100
	0110		00100100
	0011		00001000

Since we now time our wiggles to occur from 3 to 8 clocks apart, we can
put *three* clocks in every '|   |' without changing the timing of wiggles
on the disk surface! When setting up the 2,7 RLL spec, the powers-that-be 
decided to get a result of 7.5MBit/sec, so they actually slowed down the
wiggle rate to 225nsec minimum (i.e. each '|   |' is 225ns), with a clock
period one-third of that, or 75nsec. So 2,7 RLL actually stores FEWER wiggles
on a drive than MFM does!

Here's some examples, with a period marking the halfway point for each set
        of two clock periods:

Message Data:	1   0	0   0	0   0	0   0	0   0	0   0   0
Disk Data:     | .w. | .w. | .w. | .w. | .w. | .w. | .w. | .w. | . ..|...

Message Data:	1   1	1   1	1   1	1   1	1   1	1   1
Disk Data:     |w. . | .w. | . .w| . . |w. . | .w. | . .w| . . | . . |...

Message Data:	1   0	0   1	1   0	0   0   1   1   0   0   1   1
Disk Data:     | .w. | . . |w. . |w. . | . . | .w. | . . | . . |w. . | ..


IV. Comparing RLL and MFM

Now, let's compare 2,7 RLL with 1,3 RLL. Both codes happen to encode
one data bit into 2 clock periods. With 1,3 RLL (MFM), a flux reversal
can occur every two clock periods. With 2,7 RLL, a flux reversal can occur
every 3 clock periods. If we increase the clock rate while using a 2,7
RLL scheme, we get the same maximum flux reversal rate as for MFM. But, we
get more data out of the drive, at a higher data rate.

The IMPORTANT differences between the schemes are: 

        (1) the maximum clock freqency (25% higher for 2,7 RLL than MFM, 
                100% higher for ARLL [another fancy code])
        (2) the maximum Frequency Ratio (comparing minimum and maximum 
                wiggle intervals).


Attribute	1,3 RLL		2,7 RLL	       Comments
		(MFM)		(normal RLL)  

Data rate	200ns/bit	150ns/bit      RLL Faster
		(5MBit/sec)	(7.5MBit/sec) 

clocks/bit	2		2	      

physical	100ns		75ns	       2,7 RLL data is clocked with
clock rate				       25% more accuracy.
of data 

min/max clocks	2/4		3/8	      
pulse-to-pulse
on drive

min/max time	200/400ns	225/600	       Note that RLL actually stores
between pulses                                 wiggles a little SLOWER than
on drive                                       MFM!
(leading edge)


A higher data clock rate requires greater accuracy in timing all along the 
chain from disk surface to final data to be read (and the reverse). The time 
period during which the controller must decide whether a flux reversal is 
present or not is called the 'window'. The variation in flux-reversal detection
(+ or - from the nominal 'perfect detection time') allowed by a given encoding
scheme is called the 'required window margin'. Higher frequency clocks have
smaller window margin requirements. On a given drive/controller combination,
the window margin can be measured: simply sync up the electronics to the
pulses on the drive, read a worst-case data pattern, and see what kind of
variation in flux-reversal timing you get. Good drive/controller combinations
will place all flux reversals in a very narrow time window, giving a very
good window margin, and hence will work well with high-frequency encoding
schemes.

A big difference between minimum and maximum inter-wiggle times
simply requires complex decoding and phase-locked-loop circuitry
that can handle a wide range of frequencies. 

On top of these differences, the MFM code is simpler than 'RLL', since it 
is not variable-length. Variable length codes make error recovery more 
difficult, and hence make bad-sector marking more important.


V. What does all this mean in terms of real drives, controllers, etc.?

First, let's understand which parts of the whole deal go where. Here are
the pieces needed to read/write disk info on MFM/RLL drives, and where 
they are located:

	Component			Where it is

	Disk surface			Drive
	Head				Drive
	Analog head electronics 	Drive
	   (conditions signal to/from
	    head)
	Cable				Between drive and controller
	Analog data separator		Controller
	  (detects flux reversals)
	Phase Locked Loop		Controller
	  (determines data clocking)
	Digital read/write stuff	Controller
	  (includes bit/byte conversions,
	    etc etc etc)

Note that MOST of the junk is in the controller, not the drive!

On the drive:

    Oxide-surface disks and old head designs on early drives (e.g. ST-225, 
    ST-238) do not place the flux-reversal with enough accuracy to be used 
    in most RLL situations. This is why older ST-225/238 drives have so 
    much trouble. Newer drive designs use plated media and better heads, 
    which allow more accurate wiggle placement.

    The drive head and associated electronics are usually tuned to match
    the expected signals to and from the drive. If the drive was designed
    without 'RLL' (2,7) in mind, the frequency response of the drive
    electronics is 'mushy': it may not provide a crisp/accurate enough
    signal to allow the PLL to correctly sync up. On more recent drives,
    the same exact setup is used for 'MFM' (1,3 RLL) and 'RLL' (2,7 RLL); the
    drives that are certified for RLL are simply tested to verify that
    everything is OK. (The reason I'm so down on old Seagate ST225/238 is 
    that they didn't redesign anything. They simply tested the same old 
    stuff, and if it happened to pass the RLL test, they sold it as RLL).

On the controller:

    On an 'RLL' controller, everything must be carefully designed to meet the
    tighter timing requirements. Note that a VERY accurate controller can
    make up for a somewhat mushy drive: the overall timing requirements are
    based on the sum total of electronics in the path from disk media to final
    digital output. Spreading the timing error evenly between drive and 
    controller is theoretically cheaper, since neither one need be set up
    for very tight tolerances; however, a very accurate controller is not
    that hard to build today, hence the better success we're all having
    at running 'non-RLL' drives with RLL controllers.

In general:

    There's no such thing as a free lunch. There is no encoding scheme
    (so far) that gets you more data without requiring more density or
    more timing accuracy of some kind. For example, Perstor controllers
    give you *double* the density of MFM. But to get it, they push the
    wiggles closer together. You've got to have a high quality drive in order
    for this to succeed.


In conclusion:

    I have not personally tested the window margins on lots of drives or
    controllers. From recent reports though, it would appear that most if not
    all RLL controllers available today can easily make any good MFM drive
    operate properly with an RLL format. There are various lists of drives
    that are reported to work well when reformatted for RLL.

VI. What about ESDI and SCSI?

Well, they are a little different: all of the data encoding/decoding 
circuitry is on the drive; it is all designed together, and is well 
matched (hopefully!).  Putting it all together like that makes it easier 
to use fancier high frequency encoding schemes, so you'll typically see 
higher data densities on ESDI and SCSI drives.

VII. Anything else?

Sure! There are lots of even more technical, related issues to discuss:
bit shift details (bit shift is a lower level description of what causes
large window margins on a given drive); signal-to-noise ratios; pulse
amplification; pulse equalization; etc etc... and far on into things that
I know nothing about (and hope I never have to!). Actually, it's pretty
amazing when you think about it: for 99.999% of the people out there,
this stuff is just boxes, cables and cards that you plunk together and
they just *work*!

Well, that's about it.

Pete

-- 
Peter Holzmann, Octopus Enterprises   |(if you're a techie Christian & are
19611 La Mar Ct., Cupertino, CA 95014 |interested in helping w/ the Great
UUCP: {hpda,pyramid}!octopus!pete     |Commission, email dsa-contact@octopus)
DSA office ans mach=408/996-7746;Work (SLP) voice=408/985-7400,FAX=408/985-0859