poynton%vector@Sun.COM (Charles Poynton) (11/06/88)
This note was inspired by speculation in rec.video about video resolution and memory requirements. Briefly, a pixel/sample is a TVL, there are 480 picture lines per frame, and video is not sampled in RGB components. Charles Poynton No quote or disclaimer necessary. poynton@sun.com (415)336-7846 ----- SCOPE This is a tutorial which describes how television resolution is measured, how television signals are represented digitally, and how much memory is needed for a television frame in various digital representations. This discussion is limited to 525-line television, but the concepts apply to other CRT display systems. The S-VHS interface is explained briefly. RASTER AND THEORETICAL RESOLUTION There are 525 total scan lines per frame in North American television. 29.97 frames are transmitted per second. The fact that the total number of raster lines is odd means that a vertical field retrace occurs once every 262-and-one-half lines; it is this relationship which causes 2:1 interlace of the scan lines in alternate fields. Of the 525 total raster lines, 480 contain picture information. The remainder comprise vertical scanning overhead. Television system engineers measure vertical resolution in units of "cycles per picture height" (C/PH), where a cycle comprises a white element and a black element. C/PH is entirely comparable to the unit which is used to describe film resolution: line pairs per millimetre (often contracted to "lines/mm"). The maximum theoretical vertical resolution contained in the 480 picture scan lines of television is 240 C/PH, corresponding to Nyquist's principle that at least two samples [in this case, scan lines] are required to convey each cycle. ACTUAL RESOLUTION But just because you've got the samples doesn't mean that the full theoretical reolution is being conveyed. In the early days of television a typical picture tube could resolve at best about two thirds (the "Kell" factor) of the maximum theoretical vertical resolution, or about 160 C/PH. This does not indicate that fewer lines are transmitted; rather, in such a reduced-resolution system, the signal content of each scan line is not completely independent, but is to some extent related to the content of adjacent lines. Also, not all of this theoretical resolution is necessarily delivered to the face of the CRT: a transmitted or recorded signal may contain a pattern of 160 cycles vertically, but a particular picture tube (CRT) which has poor focus or poor convergence may blend these variations into invisibility, to result in an actual vertical resolution less than 160 C/PH. The aspect ratio of a 525-line television picture is 4:3, so equal vertical and horizontal resolution are obtained (assuming a Kell factor of 2/3) at a horizontal resolution of 160 times 4/3, or 213 C/PW. Multiply this by 1.2 to accommodate horizontal scanning overhead to get 256, the minimum number of cycles which must be conveyed per total line time to obtain equal vertical and horizontal resolution. Multiply this by the horizontal (line) scanning rate of 15.734 kHz to get a bandwidth for video of about 4 MHz. This reasoning, combined with the monochrome television channel spacing of 6 MHz, led the NTSC to choose a bandwidth of 4.2 MHz. This will remain forever the limit for any over-the-airwaves NTSC signal. Consumer equipment which exceeds this bandwidth is feasible but not yet available. Remember that this calculation assumes a Kell factor of 2/3; this may no longer be an appropriate assumption. "TELEVISION LINES" Just like television markete(e)rs decided early on to exaggerate picture size by stating the diagonal dimension of the screen rather than its width or height, they state "resolution" in terms of equivalent television scan lines, denoted by the abbreviation TVL ("television lines") rather than C/PH. There are two TVL per cycle: think of a cycle as a white element and a black element. If a signal is sampled and represented digitally, then each "TVL" is equivalent to one sample, so television system engineers sometimes use the terms "samples per picture height" or "samples per picture width". Actual resolution is measured optically by a calibrated wedge pattern of black and white lines. It is desirable that the same patttern, and the same resolution number, apply to both the vertical and horizontal directions. Therefore, the TVL unit is used to measure horizontal resolution as well. Since the picture aspect ratio is 4:3, the theoretical maximum 480 TVL of vertical resolution would be matched horizontally by 640 samples. One cycle per picture width consumes a time which is one total line time [572/9 us], minus the FCC minimum blanking time [10.9 us]. This is the duration in us corresponding to the picture width, and this is equivalent to the number of cycles per picture width in the first 1 MHz of video bandwidth. Double this to get samples per picture width, and divide by the picture aspect ratio to express this in units of [vertical] TVL. Hence: ((572/9)-10.9)*2*3/4 or about 79, is the number of TVL per MHz of bandwidth. LIMITING RESOLUTION The amplitude response of any electronic system generally falls off gradually as a function of frequency. The term "bandwidth" refers to the frequency at which the signal amplitude has fallen to 50% ("-3 dB") of its reference amplitude. "Limiting resolution" in television is defined as 10% of the reference amplitude. Limiting resolution is typically reached at perhaps 1.2 times the 3 dB bandwidth. Your factor may vary. NTSC has a 3 dB bandwidth of 4.2 MHz, for a resolution (at 50%) of 332 TVL. It could be argued that 10% limiting resolution could be a little higher than this, but the in NTSC the sound subcarrier is at 4.5 MHz so it is absolutely guaranteed that no resolution above 355 TVL is possible over-the-airwaves. "Advanced" or "improved" television technology, in particular frame rate doubling (de-interlacing) at the display, can achieve very close to the theoretical 480 TVL of vertical resolution (i.e. a Kell factor of unity), and would benefit from horizontal resolution up to perhaps 700 TVL for non-broadcast signals. Broadcast studio equipment typically samples at 13.5 MHz, with 720 samples per picture width. Baseband analog signals in the studio typically have a bandwidth of 5.5 MHz, and the best 525-line studio monitors are quoted as having 900 TVL of resolution at the centre of the tube. YUV REPRESENTATION (3 wires) Studio equipment typically maintains colour signals in three components YUV, which are easily derived from RGB. The Y channel contains the luminance (black-and-white) content of the image, and is computed as: Y = 0.299 R + 0.587 G + 0.114 B "Colour difference" signals U and V are scaled versions of B-Y and R-Y respectively; these vanish for monochrome (grey) signals. The human visual system has much less acuity for spatial variation of colour than for luminance, and the advantage of U and V components is that each can be conveyed with substantially less bandwidth than luminance, R or G or B. In analog YUV studio systems, U and V each have a bandwidth of 1.5 MHz. In digital systems, U and V are each horizontally subsampled by a factor of two (i.e. conveyed at half the rate of Y). Y/C REPRESENTATION (2 wires) U and V can be combined easily into a "chroma" signal which is conveyed as modulation of a continuous 3.58 MHz sine-wave subcarrier. [This frequency is exactly 455/2 times the line rate of 9/.572 kHz.] The phase of the chroma signal conveys a quantity related to hue, and its amplitude conveys a quantity related to colour saturation (purity). [Phase is decoded with reference to a "burst" of the 3.58 MHz continuous-wave subcarrier which is transmitted during the horizontal blanking interval.] The "S" connector simply carries Y and C on separate wires. This coding is easily decoded without artifacts. Current S-VHS equipment conveys chroma with severely limited bandwidth, about 300 kHz (which is just 16 cycles of U or V per picture width). Consumer VCR equipment has always recorded the luminance and chroma components separately on tape, but only with the introduction of the S-connector in S-VHS and ED-Beta equipment was the consumer able to take advantage of this capability. NTSC REPRESENTATION (1 wire) The NTSC system mixes Y and C together and conveys the result on one piece of wire. The result of this addition operation is not theoretically reversible: the process of separating luminance and colour often confuses one for the other. Cross-colour artifacts result from luminance patterns which happen to generate signals near the 3.58 MHz colour subcarrier. Such information may be decoded as swirling colour rainbows. Cross-luminance artifacts result if modulated colour information is incorrectly decoded as crawling or hanging luminance dots. It is these artifacts which can be avoided by using the S-connector interface. In general, once the NTSC footprint is impressed on a signal, it persists even if subsequent processing is performed in RGB or YUV components. Encoded NTSC signals can be sampled into a stream of 8-bit bytes. Such "composite digital" systems have the advantage of using slightly less memory than component systems, at the expense of the dreaded NTSC artifacts. Manipulation of such composite signals to perform operations such as shrinking the picture is difficult or impossible, because if the colour subcarrier frequency is altered the colour information in the signal is destroyed. Therefore, these operations are performed in the component domain. MEMORY REQUIREMENTS [Nomenclature: k=kilo=1000, K=2^10=1024, b=bit, B=Byte.] About 210 KB (480-by-430), or 1.6 Mb, is sufficient to store composite NTSC at a horizontal resolution of 320 TVL. Y/C components can be stored at S-VHS colour resolution in 256 KB (2 Mb). Consumer equipment uses as few as six bits for Y, U, or V. Composite NTSC digital studio equipment typically stores a frame as 768-by-480 samples of 8 bits each, for about 384 KB (3 Mb) per frame. Component digital equipment stores YUV components at 720-by-480 samples of 16 bits each for about 675 KB (5.4 Mb) per frame: 8-bit U and V colour components are horizontally subsampled by a factor of two with respect to luminance. Charles Poynton poynton@sun.com (415)336-7846