poggio@apple.com (Andy Poggio) (03/01/90)
CD Summary Part 2 CD Data Hierarchy Storing data on a CD may be thought of as occurring through a data encoding hierarchy with each level built upon the previous one. At the lowest level, data is physically stored as pits on the disc. It is actually encoded by several low-level mechanisms to provide high storage density and reliable data recovery. At the next level, it organized into tracks which may be digital audio or CD-ROM. The High Sierra specification then defines a file system built on CD-ROM tracks. Finally, applications like HyperCard specify a content format for files. The Physical Medium The Compact Disc itself is a thin plastic disk some 12 cm. in diameter. Information is encoded in a plastic-encased spiral track contained on the top of the disk. The spiral track is read optically by a noncontact head which scans approximately radially as the disk spins just above it. The spiral is scanned at a constant linear velocity thus assuring a constant data rate. This requires the disc to rotate at a decreasing rate as the spiral is scanned from its beginning near the center of the disc to its end near the disc circumference. The spiral track contains shallow depressions, called pits, in a reflective layer. Binary information is encoded by the lengths of these pits and the lengths of the areas between them, called land. During reading, a low power laser beam from the optical head is focused on the spiral layer and is reflected back into the head. Due to the optical characteristics of the plastic disc and the wavelength of light used, the quantity of reflected light varies depending on whether the beam is on land or on a pit. The modulated, reflected light is converted to a radio frequency, raw data signal by a photodetector in the optical head. Low-level Data Encoding To ensure accurate recovery, the disc data must be encoded to optimize the analog-to-digital conversion process that the radio frequency signal must undergo. Goals of the low level data encoding include: 1. High information density. This requires encoding that makes the best possible use of the high, but limited, resolution of the laser beam and read head optics. 2. Minimum intersymbol interference. This requires making the minimum run length, i.e. the minimum number of consecutive zero bits or one bits, as large as possible. 3. Self-clocking. To avoid a separate timing track, the data should be encoded so as to allow the clock signal to be regenerated from the data signal. This requires limiting the maximum run length of the data so that data transitions will regenerate the clock. 4. Low digital sum value (the number of one bits minus the number of zero bits). This minimizes the low frequency and DC content of the data signal which permits optimal servo system operation. A straightforward encoding would be to simply to encode zero bits as land and one bits as pits. However, this does not meet goal (1) as well as the encoding scheme actually used. The current CD scheme encodes one bits as transitions from pit to land or land to pit and zero bits as constant pit or constant land. To meet goals (2) to (4), it is not possible to encode arbitrary binary data. For example, the integer 0 expressed as thirty-two bits of zero would have too long a run length to satisfy goal (3). To accommodate these goals, each eight-bit byte of actual data is encoded as fourteen bits of channel data. There are many more combinations of fourteen bits (16,384) than there are of eight bits (256). To encode the eight-bit combinations, 256 combinations of fourteen bits are chosen that meet the goals. This encoding is referred to as Eight-to-Fourteen Modulation (EFM) coding. If fourteen channel bits were concatenated with another set of fourteen channel bits, once again the above goals may not be met. To avoid this possibility, three merging bits are included between each set of fourteen channel bits. These merging bits carry no information but are chosen to limit run length, keep data signal DC content low, etc. Thus, an eight bit byte of actual data is encoded into a total of seventeen channel bits: fourteen EFM bits and three merging bits. To achieve a reliable self-clocking system, periodic synchronization is necessary. Thus, data is broken up into individual frames each beginning with a synchronization pattern. Each frame also contains twenty-four data bytes, eight error correction bytes, a control and display byte (carrying the subcoding channels), and merging bits separating them all. Each frame is arranged as follows: Sync Pattern24 + 3channel bits Control and Display byte14 + 3 Data bytes12 * (14 + 3) Error Correction bytes 4 * (14 + 3) Data bytes12 * (14 + 3) Error Correction bytes 4 * (14 + 3) TOTAL588channel bits Thus, 192 actual data bits (24 bytes) are encoded as 588 channel bits. Editorial: A CD physically has a single spiral track about 3 miles long. CDs spin at about 500 RPM when reading near the center down to about 250 RPM when reading near the circumference. Disc with a 'c' or disk with a 'k'? A usage has emerged for these terms: disk is used for eraseable disks (e.g. magnetic disks) while disc is used for read-only (e.g. CD-ROM discs). One would presumably call a frisbee a disc. --andy