[comp.sys.ibm.pc.digest] Info-IBMPC Digest V7 #30

hicks@WALKER-EMH.ARPA (Gregory Hicks COMFLEACTS) (07/11/88)
Info-IBMPC Digest           Sun,  10 Jul 88       Volume 7 : Issue  30

This Week's Editor: Gregory Hicks -- Chinhae Korea <hicks@walker-emh.arpa>

Today's Topics:                SPECIAL RLL Issue
                               RLL encoding
                      RLL - why it is hard on drives
                       RLL Technical Details (long)
            RLL - an intuitive (and somewhat silly) explanation

Info-IBMPC Lending Library is available from:

    Bitnet via server at CCUC; and from SIMTEL20.ARPA (see file
          PD1:<msdos>files.idx for listing of source files)

    SIMTEL20.ARPA can now be accessed access from BITNET via
       LISTSERV@RPICICGE.BITNET using LISTSERV Commands

----------------------------------------------------------------------

Date: 16 May 88 19:57:19 GMT
From: octopus!pete@ucscc.UCSC.EDU (Pete Holzmann)
Subject: RLL encoding

You asked for it; I happened to find a copy in one of my magazines (Fall
1986 Computer Technology Review)... so here it is: the RLL code!

I think you'll agree that it *is* a variable length code, with constant
encoding density. It is kind of fun to play with it and verify that it
really is a 2,7 RLL code. It isn't at all obvious how to start with "I
want a 2,7 RLL code" and end up with this chart:

     Data       Code

     1          00
     01         0001
     10         0100
     11         1000
     000        100100
     001        001000
     010        000100
     0110       00100100
     0011       00001000

Have fun!

Pete

P.S.: People have requested the ERLL and ARLL codes. I don't have them
handy.  I'm not sure I have a recent enough printed reference. I know
where to go (actually, who to talk to) to get the chart; but if somebody
on the net has the codes handy, maybe they can pipe up! I can't be the
only one with access to this stuff!

--
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746

------------------------------

Date: 12 May 88 10:21:39 GMT
From: pete@octopus.UUCP (Pete Holzmann)
Subject: RLL- why it is hard on drives

pjh@mccc.UUCP (Pete Holsberg) writes:
}In article <1255@kodak.UUCP> crassi@kodak.UUCP (charlie crassi) writes:
}...I was told that the RLL controller will work fo about a month and then
}...will destroy the media which is not coated thick enough. Is there any
}...truth to the matter.


}Your data is living on borrowed time!  Any medium which is not certified
}for RLL will eventually fail (unless you are VERY lucky!) because of the
}demands an RLL controller makes on it.  The horror stories told on
}CompuServe were enough to convince me to stop using my Adaptec
}controller with an ST-225.  People have reported that they were not able
}to reformat their disks with their MFM controllers after abandoning
}RLL!!  This is not to castigate RLL, but an RLL controller needs a drive
}that can handle the storage density and waveform requirements.

Sorry, but this response is only half true!

True things:    - ST225 drives do not work well with RLL

     - A drive that does not work with RLL will develop 'bad tracks' over
time. These *can* be fixed by reformatting.

     - People sometimes end up in panic mode after RLL fails. There are
many failure modes for setting up a hard disk; some people may not be able
to reformat their drives to MFM, but it is not the fault of the RLL
controller!  (Unless it had some kind of electrical defect).

     - An RLL controller needs a drive that can handle the *waveform*
requirements.

False things: (I'm going to give the opposite, true statements):

     - Non-certified drives OFTEN work just fine, forever! Some certified
drives are marginal, and can't handle RLL for that matter.

     - "Storage Density" (physical flux changes per area) is no higher on
an RLL drive than on an MFM drive.

I guess it is time to shed some more light (hopefully!) on this subject.

MFM formatting is *actually* a form of RLL. What we call 'RLL' is just a
different encoding scheme. It increases the amount of data stored on the
disk by placing the flux changes on the platter in a more accurate pattern
than that used by MFM. The DENSITY of flux changes is not any different.
RLL simply requires more accuracy in TIMING. The term used is 'window
margin'. Essentially, there is a window of time within which the drive+
controller must decide whether or not there is a flux change on the media
surface. When using RLL, this window is smaller: the timing of the flux
changes must be more accurate.

So: low quality disk media produces poor window margins, because of the
low density of magnetic particles, etc. Cheap drive electronics produce
poor window margins, because of loose tolerances. Poor RLL controller
design can do the same thing: some controllers contribute less to window
margin error than others.

What does this mean?

   - RLL (or any other encoding scheme for that matter) does not
physically damage or change the disk drive in any way

   - Some lower-quality drives should NEVER be used for RLL, because they
don't have good enough window margins. This includes Seagate ST225 (and
probably should include the ST238; it is 'certified' by Seagate, but often
has trouble anyway).

   - Many high quality drives, especially those with plated media and
all-around good electronics, will work without trouble!  When you buy
'certification' from these manufacturers, you are NOT buying a different
drive. You are simply paying extra for a guarantee of stricter tolerances.
Depending on the drive involved, this guarantee may be essential
[ST225/238- only a few drives pass the test] or it may be completely
unnecessary [Maxtor drives work GREAT with RLL, even though not
'certified'] or something in between.

   - There are tests that can be performed on any drive/controller
combination to see if RLL will work well. It simply involves measuring the
window margin. The window margin can't be fully exercised without
specialized hardware: You can't change the timing of your PC's disk
interface beyond its worst case.

   - On the other hand, if you perform an intense read/write/format test
on your drive, using worst case data patterns, you can develop a good
level of confidence in your system. This is what the SpinRite program
does: It beats up on the inner track of a drive [inner track has highest
flux-change density] with a bunch of worst case data patterns. If this
passes, there is little or no cause for alarm.

Hope this clears things up a little!

Pete
--
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746

------------------------------

Date: 15 May 88 03:16:04 GMT
From: pete@octopus.UUCP (Pete Holzmann)
Subject: RLL Technical Details (long)

If you read all the way through this, you will (hopefully) understand WHY
RLL works/doesn't work depending on the configuration you set up. You will
also understand WHY many of the horror stories applied to RLL are almost
certainly mis-applied.

pjh@mccc.UUCP (Pete Holsberg) writes in the last digest that
:
>...       - A drive that does not work with RLL will develop 'bad tracks'
>...            over time. These *can* be fixed by reformatting.
>

>...MFM formatting is *actually* a form of RLL. What we call 'RLL' is just
a
>...different encoding scheme. It increases the amount of data stored on
the
>...disk by placing the flux changes on the platter in a more accurate pat-
tern
>...than that used by MFM. The DENSITY of flux changes is not any dif-
ferent.

>Pardon my simplistic look, but if a drive rotates at a constant speed
>regardless of its formatting and produces data (RLL formatting) at 1.5
>times the rate of MFM data transfer, then it seems that there are -- at
>least effectively if not physically -- more bits per inch.  How the
>higher effective bpi is produced is the subject of your posting, but
>it's not clear what there is about RLL that produces the increase in
>effective bpi.  Would you like to go into a little detail on 2,7 and all
that?

Your wish is my command :-)! OK, here goes with some gobbledy gook on disk
data encoding. By the way, Phil Ngai @ amd posted a brief answer; this one
will go into more detail.

I guess I should mention why you should believe me: Besides the fact that
I'm just a nice guy, I've worked on the microcode for disk drive test
equipment. In order to do a good drive tester, you've gotta understand the
low level guts of these things!

I. How is data stored on a disk drive?

As magnetic flux reversals (think of it as + to -). The POLARITY of the
magnetic flux doesn't mean a thing. It is the TIMING of the flux reversals
that is used to encode data.

II. What is RLL? What does the '2,7' in '2,7 RLL' mean?

RLL means Run Length Limited. The Limits in disk drive RLL refer to the
minimum and maximum time between flux reversals. '2,7' means minimum of 2,
maximum of 7. A minimum of zero would mean that flux reversals can occur
in every clock period. Thus, '2,7' means that flux reversals occur at
least every 8th clock period (7 periods without a reversal), but no more
often than every third clock.

RLL codes are 'self clocking'. Since you are guaranteed to have a flux
reversal within a limited time, a phase-locked-loop circuit can find the
basic clock period of data on the drive. As the basic clock period gets
smaller and/or the maximum inter-flux-reverse time increases, the job gets
harder and harder for the phase-locked-loop circuitry.

III. What about MFM?

MFM is simply 1,3 RLL encoding, with a basic clock period of 50 nsec.  One
data bit is encoded every two clock periods. The MFM code is relatively
easy to understand [and I have some notes handy], so I'll give the
complete details:

In this table of flux encoding, '0' means no flux change, '1' means a flux
change encoding a '1' data bit, 'C' means a flux change required to encode
a '0' data bit due to clocking requirements.

The code: 1 always becomes 0 1
       0 becomes 0 0 if preceeded by a 1
       0 becomes C 0 if preceeded by a 0

Message Data:   1   0   0   0   0   0   0   0   0   0   0   0
Disk Data:      0 1 0 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 C 0 ...

Message Data:   1   1   1   1   1   1   1   1   1   1   1   1
Disk Data:      0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ...

Message Data:   1   0   0   1   1   0   1   0   0   1   0
Disk Data:      0 1 0 0 C 0 0 1 0 1 0 0 0 1 0 0 C 0 0 1 0 0

Note that there are between 1 and 3 zeros between every 1 in the disk
data!

Note that since 'C' is physically the same as '1' (both are flux
reversals), the setup gets in trouble if it loses track of clock periods!

The way this is used on a disk drive is that there is a special data
sequence encoded at the beginning of each sector, with special hardware to
detect it: First, there is a long string of zero's; a hardware 'zero
detector' is enabled to look for it. At this point, it could as easily
find a string of one's as a string of zero's, since they are identical
when taken out of context. Second, a special byte is encoded that VIOLATES
the RLL rules: an 'A1' or 'A8' byte is written, with a clock missing in
one of the sequential zero bits (the A1 and A8 tell us whether we are
looking at the header of the sector, which contains cyl/sector/etc info,
or the data portion of the sector). The special byte is called the Address
Mark. If zeros followed by an Address Mark are found, then the PLL (phase
locked loop) is synchronized and data can be read.

IV. Ok, so explain the 'RLL' schemes.

I don't have complete tables of code schemes for all of the RLL formats
handy; it would also take a long time to type them all in. Instead, I'll
explain what IS important about them.

First, let's compare 2,7 RLL with 1,3 RLL. Both codes happen to encode one
data bit into 2 clock periods. With 1,3 RLL (MFM), a flux reversal can
occur every two clock periods. With 2,7 RLL, a flux reversal can occur
every 3 clock periods. If we increase the clock rate by 50% using a 2,7
RLL scheme, we get the same maximum flux reversal rate as for MFM. But, we
get 50% more data out of the drive, at a 50% higher data rate.

Other RLL encoding schemes involve changes in the number of clock periods
used to encode a data bit. For example, 1,7 RLL encodes 2 data bits into 3
clock periods. The 1,7 clock period must be kept the same as for 1,3 (MFM)
(I hope you see why by now: both schemes involve a flux change as often as
every 2 clocks). The result is a 50% increase in storage capacity, just as
with 2,7 RLL.

Why not use 1,7 RLL? Because the difference between minimum and maximum
flux-change-intervals is so great. It turns out that the PLL electronics
for detecting this wide a range of intervals is a real pain; worse,
presumably, than the problems involved in implementing 2,7 RLL.

Other encoding schemes use different clock rates and different min/max
combinations. They all set things up so the maximum flux-reversal
frequency is the same.

The IMPORTANT differences between the schemes involve maximum clock
freqency (50% higher for 2,7 RLL than MFM, 100% higher for ARLL than MFM)
and maximum Frequency Ratio (comparing minimum and maximum flux-reversal
intervals).  In addition, some schemes involve simpler encoding/decoding
algorithms (e.g.  the normal 1,3 RLL/MFM); others are very complex: 2,7
RLL is a variable length code (e.g. 0011 maps to 00001000 but 010 maps to
100100); I don't have a simple formula for the 2,7 RLL code! Variable
length codes make error recovery more difficult, and hence make bad-sector
marking more important.

A high frequency clock requires great accuracy in timing all along the
chain from disk surface to final data to be read (and the reverse). The
time period during which the controller must decide whether a flux
reversal is present or not is called the 'window'. The variation in
flux-reversal detection (+ or - from the nominal 'perfect detection time')
allowed by a given encoding scheme is called the 'required window margin'.
Higher frequency clocks have smaller window margin requirements. On a
given drive/controller combination, the window margin can be measured:
simply sync up the electronics to the pulses on the drive, read a
worst-case data pattern, and see what kind of variation in flux-reversal
timing you get. Good drive/controller combinations will place all flux
reversals in a very narrow time window, giving a very good window margin,
and hence will work well with high-frequency encoding schemes.

A big difference between minimum and maximum flux reversal intervals
simply requires complex decoding and phase-locked-loop circuitry that can
handle a wide range of frequencies. All of which leads us to...

V. What does all this mean in terms of real drives, controllers, etc.?

First, let's understand which parts of the whole deal go where. Here are
the pieces needed to read/write disk info, and where they are located:

     Component            Where it is

     Disk surface              Drive
     Head                 Drive
     Analog head electronics        Drive
        (conditions signal to/from
         head)
     Cable                     Between drive and controller
     Analog data separator          Controller
       (detects flux reversals)
     Phase Locked Loop         Controller
       (determines data clocking)
     Digital read/write stuff  Controller
       (includes bit/byte conversions,
         etc etc etc)

Note that MOST of the junk is in the controller, not the drive!

On the drive:

    Oxide-surface disks on early drive designs (e.g. ST-225, ST-238) do
not place the flux-reversal with enough accuracy to be used in most RLL
situations. This is why ST-225/238 drives have so much trouble.  Newer
drive designs use plated media, which allow better magnetic definition.

    The drive head and associated electronics are usually tuned to match
the expected signals to and from the drive. If the drive was designed
without 'RLL' (2,7) in mind, the frequency response of the drive
electronics is 'mushy': it may not provide a crisp/accurate enough signal
to allow the PLL to correctly sync up. On more recent drives, the same
exact setup is used for 'MFM' (1,3 RLL) and 'RLL' (2,7); the drives that
are certified for RLL are simply tested to verify that everything is OK.
(The reason I'm so down on Seagate ST225/238 is that they didn't redesign
anything. They simply test the same old stuff, and if it happens to pass
the RLL test, they sell it as RLL).

On the controller:

    On an 'RLL' controller, everything must be carefully designed to meet
the tighter timing requirements. Note that a VERY accurate controller can
make up for a somewhat mushy drive: the overall timing requirements are
based on the sum total of electronics in the path from disk media to final
digital output. Spreading the timing error evenly between drive and
controller is theoretically cheaper, since neither one need be set up for
very tight tolerances; however, a very accurate controller is not that
hard to build today, hence the better success we're all having at running
'non-RLL' drives with RLL controllers.

In general:

    There's no such thing as a free lunch. There is no encoding scheme (so
far) that gets you more data without requiring more density or more timing
accuracy of some kind. Somebody mentioned an amazing new Perstor
controller that doubles drive density, supposedly without increasing the
timing requirements. HAH! You sure can't get double the flux-reversals in
the same space, so you MUST do it by increasing the timing requirements.
The Perstor simply is an ARLL controller (I'm not certain, but I believe
ARLL, getting 100% more data than MFM, is a 4,7 RLL encoding scheme); it
will have trouble with some low quality drives just like the other RLL
controllers do.

    I have not personally tested the window margins on lots of drives or
controllers. I have talked with people who HAVE done this testing; their
results say that the Adaptec RLL controllers have the best timing of all
RLL controllers on the market today (as of a month ago), and confirm what
I've heard/seen about Miniscribe and Maxtor drives (they also have good
enough timing), and about Seagate ST225/238 (poor to marginal).

VI. What about ESDI and SCSI?

Well, they are kind of handy: all of the data encoding/decoding circuitry
is on the drive; it is all designed together, and is well matched
(hopefully!).  Putting it all together like that makes it easier to use
fancier high fre- quency encoding schemes, so you'll typically see higher
data densities on ESDI and SCSI drives.

VII. Anything else?

Sure! There are lots of even more technical, related issues to discuss:
bit shift details (bit shift is a lower level description of what causes
large window margins on a given drive); signal-to-noise ratios; pulse
amplification; pulse equalization; etc etc... and far on into things that
I know nothing about (and hope I never have to!). Actually, it's pretty
amazing when you think about it: for 99.999% of the people out there, this
stuff is just boxes, cables and cards that you plunk together and they
just *work*!

Well, that's about it. I've run out of time, so I'd better send this now.
I hope it helped more than it confused! [And no, I don't think you'll find
drive manufacturers or controller manufacturers very willing to provide
detailed spec's on their window margins; that would make it too easy to
compare drive quality! :-(]

Pete

P.S.: If you read all the way to here, congratulations! I don't really
expect that this stuff would really be interesting enough for people to
read through 250 lines of gobbledy gook... :-)

--
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746

------------------------------

Date: 15 May 88 23:02:27 GMT
From: chris@mimsy.UUCP (Chris Torek)
Subject: RLL: an intuitive (and somewhat silly) explanation

pete@octopus.UUCP (Pete Holzmann) writes
a nice long article describing 2,7 RLL encoding.  (Thanks, by the way,
although what I really wanted to see was that table you did not reproduce
:-) .)  Now let me try for the intuitive description, with some nitpicky
technical stuff too.  First the technical nits:

>I. How is data stored on a disk drive?
>As magnetic flux reversals (think of it as + to -). The POLARITY of the
>magnetic flux doesn't mean a thing. It is the TIMING of the flux
>reversals that is used to encode data.

Sort of.  + to - or - to + is not important, but it is not timing, but
rather presence, of reversals that encodes data.  The simplest encoding
---the old `IBM format' or `FM' or how you got a whole 128K :-) on an
eight-inch floppy---is simply clock+data, clock+data, ... where a data `1'
is a reversal and a data `0' is a missing reversal.

>II. What is RLL? What does the '2,7' in '2,7 RLL' mean?
>RLL means Run Length Limited. The [numbers] refer to the minimum and
maximum time between flux reversals.

(Remember this for the intuitive explanation.)

>RLL codes are 'self clocking'.

All common drive formats are self-clocking: that just means the timing is
stored in among the data somehow, rather than on something external.  An
example of a non-self-clocking medium might be a paper tape (although
paper tape has sprocket holes that can be used for clocking---this is how
the `pull the tape through by hand' readers work---most paper tape readers
relied on a constant rate of tape motion.)

>... 2,7 RLL is a variable length code (e.g. 0011 maps to 00001000 but
>010 maps to 100100); I don't have a simple formula for the 2,7 RLL code!

This is not variable length: four bits went to eight, and three to six.
In other words, n bits becomes 2n bits.  I think someone just squashed
equivalent table entries to make it look variable length.

>P.S.: If you read all the way to here, congratulations! I don't really
expect that this stuff would really be interesting enough for people to
read through 250 lines of gobbledy gook... :-)

(You might be surprised.)

Anyway, time for intuition.  None of the following is strictly accurate,
but it should give you a good feel for how these things work.

When you stick flux changes (data and/or clock) onto a disk surface, they
tend to `wiggle around', somewhat like paper dots dropped from a bit too
high up, in a slight breeze.  The better the drive, the more accurately
the dots land where they were supposed to go, but they always wind up
slightly out of place.

Unlike the paper dots, though, those darned flux changes wiggle *more*
when you put them closer together.  If you put too many of them in a row,
they crawl right out of where they were supposed to be:

    you wanted 1111:
    < wiggle >< wiggle >< wiggle >< wiggle >
     0         1         2         3

    but you got 11?0:
    < wiggle ><   wiggl><     wig><e      w>ggle
     0         1         2         3

The darned things are scared of each other!  We have to make sure the
closest we ever put them is one wiggle-space apart, or they will get
scared and wiggle away.  We will represent this minimum wiggle-space with
| marks below.  (Note that the wiggles can see through the marks and will
get scared if they are less than two | marks apart.  This should make
sense as soon as you get to the MFM diagram.)

On the other hand, if you put them too far apart, the controller gets
forgetful, as if it were counting sheep and the sheep stopped jumping:

    you wanted 10000001:
    | w |   |   |   |   |   |   | w |
      0   1   2   3   4   5   6   7

    but you got 1000..??oops
    | w |   |   |   |   |   |   | w |
      0   1   2   3  ... now where was I?

So the idea is that we have to put the wiggles close enough together so
that the controller does not forget to keep counting, but far enough apart
so that they do not scare each other away.  (In engineer-ese, the flux
reversals must be far enough apart not to interfere, but close enough
together to keep the decoding circuit in sync.)

Well, one way to do this is to use good ol' FM format, and put the clock
and data wiggles fairly far apart.  We always get clock+data, and clock is
always 1, so we get wiggle+blank or wiggle+wiggle.  If we got a whole
series of wiggle+blanks, each wiggle would be two spaces apart, and if we
got a whole series of wiggle+wiggles, they would be one space apart.  One
space has to be `far enough' and two has to be `close enough'.  This is
easy to arrange, but it only gives us a measly 128K on a whole eight-inch
disk.

So what to do?  Well, how about Modified FM?  We will put the clock wiggle
in only if we do not have enough data wiggles to keep the controller
counting.  If we have two data wiggles next to each other, we can put them
one space apart, because there will not be any clock wiggle to scare them
away.  Write it this way, with a `.' marking half-spaces.  The clock
wiggle is the one on the left of its dot.

     you wanted 11001011:
     | .w| .w| . |w. | .w| . | .w| .w|
       0   1   2   3   4   5   6   7

The wiggles are still not too close---always at least one space apart, and
sometimes one and a half spaces---yet still not *too* far apart.

But maybe we can do better.  And with 2,7 RLL, we do!  Instead of putting
in a wiggle for every `1' data bit, and a special clock wiggle if we need
one, suppose we make up some tables giving whole bunches of arrangements
where we wiggles are at least three dots apart, but at most eight.  Using
Pete's two examples:

>e.g. 0011 maps to 00001000 but 010 maps to 100100

     you want 0011 010 010, so write 00001000 100100 100100:
     | . . | .w. | . .w| . .w| . .w| . .w| . .

The wiggles are still always at least one space (three dots) apart (we
never get something like | . .w| .w. | where the two are only two-thirds
of a minimum wiggle-space apart), and they are never more than three
(really eight-ninths) spaces apart.  Or again in EE-speak, the minimum
flux separation is still one unit, and the maximum 8/9 units, so they will
not interfere and the electronics will stay in sync.

Why, then, might some drives have trouble with 2,7 RLL encoding?  Well,
compare an MFM picture with a 2,7 RLL picture:

     MFM: |  . w|  . w|  .  |w .  |  . w|  .  |  . w|
     RLL: | . . | .w. | . .w| . .w| . .w| . . |w. . |

In MFM we only have to be able to tell whether a wiggle (flux change) is
to the left or the right of a whole box, but in RLL we have to tell
whether it is in the first, second, or last third of the box.  If the
wiggles are wigglier than usual, or if the controller gets sloppy, one of
the wiggles might wiggle in the middle instead of on the right.  When that
happens, you get a read error.

In other words, Pete is exactly right.  What a 2,7 RLL encoding demands is
not that the wiggles (flux changes) be put closer together, but rather
that their position within each wiggle-space be pinpointed more
accurately.  If you try a drive not rated for this, it might not work:
Some drives just have loose wiggles.

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:    chris@mimsy.umd.edu
Path:      uunet!mimsy!chris

------------------------------

************************
End of Info-IBMPC Digest
-------