[comp.sys.misc] RLL Technical Details

pete@octopus.UUCP (Pete Holzmann) (05/15/88)

If you read all the way through this, you will (hopefully) understand WHY
RLL works/doesn't work depending on the configuration you set up. You will
also understand WHY many of the horror stories applied to RLL are almost
certainly mis-applied.

In article <650@mccc.UUCP> pjh@mccc.UUCP (Pete Holsberg) writes:
>In article <216@octopus.UUCP> I write:
>...		- A drive that does not work with RLL will develop 'bad tracks'
>...			over time. These *can* be fixed by reformatting.
>
>Jeez, Pete!  Call me a liar in front of all these people?  If you don't
>believe what I said, [read about an experienced PC guru's experiences...]

I don't mean to call you a liar. Misled, yes. That isn't your fault! This
article is an attempt to correct the problem...

>...MFM formatting is *actually* a form of RLL. What we call 'RLL' is just a
>...different encoding scheme. It increases the amount of data stored on the
>...disk by placing the flux changes on the platter in a more accurate pattern
>...than that used by MFM. The DENSITY of flux changes is not any different.
>
>Pardon my simplistic look, but if a drive rotates at a constant speed
>regardless of its formatting and produces data (RLL formatting) at 1.5
>times the rate of MFM data transfer, then it seems that there are -- at
>least effectively if not physically -- more bits per inch.  How the
>higher effective bpi is produced is the subject of your posting, but
>it's not clear what there is about RLL that produces the increase in
>effective bpi.  Would you like to go into a little detail on 2,7 and all that?

Your wish is my command :-)! OK, here goes with some gobbledy gook on disk
data encoding. By the way, Phil Ngai @ amd posted a brief answer; this one
will go into more detail.

I guess I should mention why you should believe me: Besides the fact that
I'm just a nice guy, I've worked on the microcode for disk drive test
equipment. In order to do a good drive tester, you've gotta understand the
low level guts of these things!

I. How is data stored on a disk drive?

As magnetic flux reversals (think of it as + to -). The POLARITY of the
magnetic flux doesn't mean a thing. It is the TIMING of the flux reversals
that is used to encode data.

II. What is RLL? What does the '2,7' in '2,7 RLL' mean?

RLL means Run Length Limited. The Limits in disk drive RLL refer to the
minimum and maximum time between flux reversals. '2,7' means minimum of 2,
maximum of 7. A minimum of zero would mean that flux reversals can occur
in every clock period. Thus, '2,7' means that flux reversals occur at least
every 8th clock period (7 periods without a reversal), but no more often
than every third clock.

RLL codes are 'self clocking'. Since you are guaranteed to have a flux
reversal within a limited time, a phase-locked-loop circuit can find the
basic clock period of data on the drive. As the basic clock period gets
smaller and/or the maximum inter-flux-reverse time increases, the job
gets harder and harder for the phase-locked-loop circuitry.

III. What about MFM?

MFM is simply 1,3 RLL encoding, with a basic clock period of 50 nsec.
One data bit is encoded every two clock periods. The MFM code is relatively
easy to understand [and I have some notes handy], so I'll give the complete
details:

In this table of flux encoding, '0' means no flux change, '1' means a
	flux change encoding a '1' data bit, 'C' means a flux change 
	required to encode a '0' data bit due to clocking requirements.

The code: 1 always becomes 0 1
	  0 becomes 0 0 if preceeded by a 1
	  0 becomes C 0 if preceeded by a 0

Message Data:	1   0   0   0   0   0   0   0   0   0   0   0
Disk Data:	0 1 0 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 ...

Message Data:	1   1   1   1   1   1   1   1   1   1   1   1
Disk Data:	0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ...

Message Data:	1   0   0   1   1   0   1   0   0   1   0
Disk Data:	0 1 0 0 C 0 0 1 0 1 0 0 0 1 0 0 C 0 0 1 0 0

Note that there are between 1 and 3 zeros between every 1 in the disk data!

Note that since 'C' is physically the same as '1' (both are flux reversals),
    the setup gets in trouble if it loses track of clock periods!

The way this is used on a disk drive is that there is a special data sequence
encoded at the beginning of each sector, with special hardware to detect
it: First, there is a long string of zero's; a hardware 'zero detector' is
enabled to look for it. At this point, it could as easily find a string of
one's as a string of zero's, since they are identical when taken out of
context. Second, a special byte is encoded that VIOLATES the RLL rules: an
'A1' or 'A8' byte is written, with a clock missing in one of the sequential
zero bits (the A1 and A8 tell us whether we are looking at the header of
the sector, which contains cyl/sector/etc info, or the data portion of the
sector). The special byte is called the Address Mark. If zeros followed by
an Address Mark are found, then the PLL (phase locked loop) is synchronized
and data can be read.

IV. Ok, so explain the 'RLL' schemes.

I don't have complete tables of code schemes for all of the RLL formats
handy; it would also take a long time to type them all in. Instead, I'll
explain what IS important about them.

First, let's compare 2,7 RLL with 1,3 RLL. Both codes happen to encode
one data bit into 2 clock periods. With 1,3 RLL (MFM), a flux reversal
can occur every two clock periods. With 2,7 RLL, a flux reversal can occur
every 3 clock periods. If we increase the clock rate by 50% using a 2,7
RLL scheme, we get the same maximum flux reversal rate as for MFM. But, we
get 50% more data out of the drive, at a 50% higher data rate.

Other RLL encoding schemes involve changes in the number of clock periods
used to encode a data bit. For example, 1,7 RLL encodes 2 data bits into
3 clock periods. The 1,7 clock period must be kept the same as for 1,3 (MFM)
(I hope you see why by now: both schemes involve a flux change as often as
every 2 clocks). The result is a 50% increase in storage capacity, just as
with 2,7 RLL.

Why not use 1,7 RLL? Because the difference between minimum and maximum
flux-change-intervals is so great. It turns out that the PLL electronics
for detecting this wide a range of intervals is a real pain; worse, presumably,
than the problems involved in implementing 2,7 RLL.

Other encoding schemes use different clock rates and different min/max
combinations. They all set things up so the maximum flux-reversal frequency
is the same.

The IMPORTANT differences between the schemes involve maximum clock freqency
(50% higher for 2,7 RLL than MFM, 100% higher for ARLL than MFM) and maximum
Frequency Ratio (comparing minimum and maximum flux-reversal intervals).
In addition, some schemes involve simpler encoding/decoding algorithms (e.g.
the normal 1,3 RLL/MFM); others are very complex: 2,7 RLL is a variable length
code (e.g. 0011 maps to 00001000 but 010 maps to 100100); I don't have a
simple formula for the 2,7 RLL code! Variable length codes make error
recovery more difficult, and hence make bad-sector marking more important.

A high frequency clock requires great accuracy in timing all along the chain
from disk surface to final data to be read (and the reverse). The time period
during which the controller must decide whether a flux reversal is present
or not is called the 'window'. The variation in flux-reversal detection
(+ or - from the nominal 'perfect detection time') allowed by a given encoding
scheme is called the 'required window margin'. Higher frequency clocks have
smaller window margin requirements. On a given drive/controller combination,
the window margin can be measured: simply sync up the electronics to the
pulses on the drive, read a worst-case data pattern, and see what kind of
variation in flux-reversal timing you get. Good drive/controller combinations
will place all flux reversals in a very narrow time window, giving a very
good window margin, and hence will work well with high-frequency encoding
schemes.

A big difference between minimum and maximum flux reversal intervals 
simply requires complex decoding and phase-locked-loop circuitry
that can handle a wide range of frequencies. All of which leads us to...

V. What does all this mean in terms of real drives, controllers, etc.?

First, let's understand which parts of the whole deal go where. Here are
the pieces needed to read/write disk info, and where they are located:

	Component			Where it is

	Disk surface			Drive
	Head				Drive
	Analog head electronics		Drive
	   (conditions signal to/from
	    head)
	Cable				Between drive and controller
	Analog data separator		Controller
	  (detects flux reversals)
	Phase Locked Loop		Controller
	  (determines data clocking)
	Digital read/write stuff	Controller
	  (includes bit/byte conversions,
	    etc etc etc)

Note that MOST of the junk is in the controller, not the drive!

On the drive:

    Oxide-surface disks on early drive designs (e.g. ST-225, ST-238) do not
    place the flux-reversal with enough accuracy to be used in most RLL
    situations. This is why ST-225/238 drives have so much trouble.
    Newer drive designs use plated media, which allow better magnetic
    definition.

    The drive head and associated electronics are usually tuned to match
    the expected signals to and from the drive. If the drive was designed
    without 'RLL' (2,7) in mind, the frequency response of the drive
    electronics is 'mushy': it may not provide a crisp/accurate enough
    signal to allow the PLL to correctly sync up. On more recent drives,
    the same exact setup is used for 'MFM' (1,3 RLL) and 'RLL' (2,7); the
    drives that are certified for RLL are simply tested to verify that
    everything is OK. (The reason I'm so down on Seagate ST225/238 is that
    they didn't redesign anything. They simply test the same old stuff, and
    if it happens to pass the RLL test, they sell it as RLL).

On the controller:

    On an 'RLL' controller, everything must be carefully designed to meet the
    tighter timing requirements. Note that a VERY accurate controller can
    make up for a somewhat mushy drive: the overall timing requirements are
    based on the sum total of electronics in the path from disk media to final
    digital output. Spreading the timing error evenly between drive and 
    controller is theoretically cheaper, since neither one need be set up
    for very tight tolerances; however, a very accurate controller is not
    that hard to build today, hence the better success we're all having
    at running 'non-RLL' drives with RLL controllers.

In general:

    There's no such thing as a free lunch. There is no encoding scheme
    (so far) that gets you more data without requiring more density or
    more timing accuracy of some kind. Somebody mentioned an amazing new
    Perstor controller that doubles drive density, supposedly without
    increasing the timing requirements. HAH! You sure can't get double
    the flux-reversals in the same space, so you MUST do it by increasing
    the timing requirements. The Perstor simply is an ARLL controller
    (I'm not certain, but I believe ARLL, getting 100% more data than MFM,
    is a 4,7 RLL encoding scheme); it will have trouble with some low
    quality drives just like the other RLL controllers do.

    I have not personally tested the window margins on lots of drives or
    controllers. I have talked with people who HAVE done this testing; their
    results say that the Adaptec RLL controllers have the best timing of
    all RLL controllers on the market today (as of a month ago), and confirm
    what I've heard/seen about Miniscribe and Maxtor drives (they also have
    good enough timing), and about Seagate ST225/238 (poor to marginal).

VI. What about ESDI and SCSI?

Well, they are kind of handy: all of the data encoding/decoding circuitry
is on the drive; it is all designed together, and is well matched (hopefully!).
Putting it all together like that makes it easier to use fancier high frequency
encoding schemes, so you'll typically see higher data densities on ESDI and
SCSI drives.

VII. Anything else?

Sure! There are lots of even more technical, related issues to discuss:
bit shift details (bit shift is a lower level description of what causes
large window margins on a given drive); signal-to-noise ratios; pulse
amplification; pulse equalization; etc etc... and far on into things that
I know nothing about (and hope I never have to!). Actually, it's pretty
amazing when you think about it: for 99.999% of the people out there,
this stuff is just boxes, cables and cards that you plunk together and
they just *work*!

Well, that's about it. I've run out of time, so I'd better send this now.
I hope it helped more than it confused! [And no, I don't think you'll find
drive manufacturers or controller manufacturers very willing to provide
detailed spec's on their window margins; that would make it too easy to
compare drive quality! :-(]

Pete

P.S.: If you read all the way to here, congratulations! I don't really expect
that this stuff would really be interesting enough for people to read through
250 lines of gobbledy gook... :-)
-- 
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746

palowoda@megatest.UUCP (Bob Palowoda) (05/15/88)

in article <218@octopus.UUCP>, pete@octopus.UUCP (Pete Holzmann) says:
> compare drive quality! :-(]
> 
> Pete
> 
> P.S.: If you read all the way to here, congratulations! I don't really expect
> that this stuff would really be interesting enough for people to read through
> 250 lines of gobbledy gook... :-)
          
	  Actually its very good gobbledy gook... :-)

	  ---Bob

bobmon@iuvax.cs.indiana.edu (RAMontante) (05/16/88)

This is a vote of appreciation for articles like this.  Pete Hol(which is this?
sman or zberg?) posted a question about the appropriateness of such a thing;
I feel it is far more appropriate than most of what we see on the net.  More
detail, not less; we can always stop reading as soon as the snow has gotten
deep enough.

And sure enough -- I looked all through the product specification for my ST-238,
and there's nary a word about "window margins".  Considering that it specifies
neato stuff like:
	Recording Density	14,740 BPI
	Flux Density		 9,827 FCI
and required timings for buffering your seek pulses (100nsec min. between
DIRECTION IN and the first step pulse, don't change this in BASICA :-), that
window margin must be a sensitive number.

Incidentally, I've seen a TI Business Pro document; it does give that parameter
along with many others.  Probably because they know their machine is too
nonstandard for most third-party mfr's to want to get involved anyway.

Something else you're surely dying to know:  when Seagate says 65ms average
maximum seek time, that's for a 205-track (1/3 stroke) seek, average of an
inward seek and an outward seek, using buffered seek pulses.  The full stroke,
innermost to outermost, is 150ms max.  Track-to-track is 20ms max.  One seek
error per million seeks; one recoverable read error per 10 billion bits read,
one NONrecoverable read error per trillion bits...

So make sure you press your salesman closely on these specs!

pjh@mccc.UUCP (Pete Holsberg) (05/17/88)

In article <218@octopus.UUCP> pete@octopus.UUCP (Pete Holzmann) writes:
...If you read all the way through this, you will (hopefully) understand WHY
...RLL works/doesn't work depending on the configuration you set up. You will
...also understand WHY many of the horror stories applied to RLL are almost
...certainly mis-applied.
...
...In article <650@mccc.UUCP> pjh@mccc.UUCP (Pete Holsberg) writes:
...I don't mean to call you a liar. Misled, yes. That isn't your fault! This
...article is an attempt to correct the problem...

Misled?  You think that the guys who told me that they were unable to
MFM-format a plated media drive that they had run under RLL were
kidding?  Frankly, I didn't initially believe them either, but enough of
them said the same thing that I got rid of my ST-225 asap.  Maybe it's
another hummingbird case: theoretically, there's nothing that an RLL
controller can do to prevent a disk from being reformatted, but in these
cited cases, it happened.  Probably in conjunction with a powersupply
problem or some such thing, eh?

...I. How is data stored on a disk drive?
...
...As magnetic flux reversals (think of it as + to -). The POLARITY of the
...magnetic flux doesn't mean a thing. It is the TIMING of the flux reversals
...that is used to encode data.

Don't the - to + reversals count?  The d(phi)/dt exists!  By timing, do
you mean the magnitude of the d(phi)/dt or what?

...P.S.: If you read all the way to here, congratulations! I don't really expect
...that this stuff would really be interesting enough for people to read through
...250 lines of gobbledy gook... :-)

Thanks for the "gobbledygook"!  I'll have to read it at my leisure to
see if it makes any sense to me.

pjh@mccc.UUCP (Pete Holsberg) (05/17/88)

In article <8769@iuvax.cs.indiana.edu> bobmon@iuvax.UUCP (RAMontante) writes:
...This is a vote of appreciation for articles like this.  Pete Hol(which is this?
...sman or zberg?) posted a question about the appropriateness of such a thing;
   ^^^^^^^^^^^^^
   	       ---- actually "zman or sberg"!  :-)  'sberg's original
query; 'zman's thoughtful and detailed reply.

johnl@n3dmc.UUCP (John Limpert) (05/17/88)

In article <661@mccc.UUCP> pjh@mccc.UUCP (Pete Holsberg) writes:
>Misled?  You think that the guys who told me that they were unable to
>MFM-format a plated media drive that they had run under RLL were
>kidding?  Frankly, I didn't initially believe them either, but enough of
>them said the same thing that I got rid of my ST-225 asap.  Maybe it's
>another hummingbird case: theoretically, there's nothing that an RLL
>controller can do to prevent a disk from being reformatted, but in these
>cited cases, it happened.  Probably in conjunction with a powersupply
>problem or some such thing, eh?

I think the problems with drives that can't be reformatted are caused
by destruction of servo information on drives that use the wedge
servo technique.  From previous discussions of the problem, it appears
that improper formatting with a RLL controller can overwrite the
servo section of a track.  Once this has happened, the only way to fix
the drive is to send it to a repair shop that has the special equipment
needed to rewrite the servo information.  I have heard of people
wiping out several drives in this manner.  It seems like the drive doesn't
prevent the controller from writing over the servo information.
A MFM controller might be able to do this if programmed incorrectly.

-- 
John A. Limpert
UUCP:	johnl@n3dmc.UUCP	uunet!n3dmc!johnl
PACKET:	n3dmc@n3dmc.ampr.org	n3dmc@wa3pxx

ward@cfa.harvard.EDU (Steve Ward) (05/18/88)

In article <661@mccc.UUCP>, pjh@mccc.UUCP (Pete Holsberg) writes:
> In article <218@octopus.UUCP> pete@octopus.UUCP (Pete Holzmann) writes:
> ...If you read all the way through this, you will (hopefully) understand WHY
> ...RLL works/doesn't work depending on the configuration you set up. You will
>....
> 
> Misled?  You think that the guys who told me that they were unable to
> MFM-format a plated media drive that they had run under RLL were
> kidding?  Frankly, I didn't initially believe them either, but enough of
> them said the same thing that I got rid of my ST-225 asap.  Maybe it's
> another hummingbird case: theoretically, there's nothing that an RLL
> controller can do to prevent a disk from being reformatted, but in these
> cited cases, it happened.  Probably in conjunction with a powersupply
> problem or some such thing, eh?
> 

MFM controllers may not work with RLL drives and vica versa, at least
with ST506-type disk drive interfaces.  This may account for some of
above commentary and problems.

This is so because the ST506 drive requires the data separator to be
on the disk drive controller - the data separator is not on the disk
drive.drive itself.  The controller interfaces to the data read/write
electronics in a fairly direct fashion.  The controller has to generate
an encoded data+clock signal and making assumptions (write
precompensation, read/write waveform characteristics, esp. pulse discr.
behavior) along the way about a drive they didn't manufacture.  This is
further complicated by the fact that the RLL drive is optimized for a
read/write pulse rate correlated to a bit rate of 7.5MBPS while the
MFM drive is 5.0MBPS.  It becomes a serious problem to match the
on-controller data separator to the on-disk read/write electronics
in any case, but especially in any universal MFM+RLL method.  It is not
impossible, just very hard.  This means you might see combinations
of controllers and disk drives that work better/worse together and
a drive that might work RLL and MFM with particular controllers but
not with others.  For best results, talk to the drive manufacturers
and use controllers they recommend.  The real solution is to place the
data separator on the disk drive so that the disk drive read/write
electronics can be optimized and matched with the data separator.
A SCSI I/O disk drive presumably has the electronics matched since
a single vendor makes the drive and electronics.  ESDI drives place
the data separator on (usually inside) the disk drive.