[comp.graphics] Video Resolution

poynton%vector@Sun.COM (Charles Poynton) (11/06/88)
This note was inspired by speculation in rec.video about video resolution
and memory requirements.  Briefly, a pixel/sample is a TVL, there are 480
picture lines per frame, and video is not sampled in RGB components.

Charles Poynton				No quote or disclaimer necessary.
poynton@sun.com				(415)336-7846

-----

SCOPE

This is a tutorial which describes how television resolution is measured,
how television signals are represented digitally, and how much memory is
needed for a television frame in various digital representations.

This discussion is limited to 525-line television, but the concepts apply
to other CRT display systems.  The S-VHS interface is explained briefly.  

RASTER AND THEORETICAL RESOLUTION

There are 525 total scan lines per frame in North American television.
29.97 frames are transmitted per second.  The fact that the total number
of raster lines is odd means that a vertical field retrace occurs once
every 262-and-one-half lines; it is this relationship which causes 2:1
interlace of the scan lines in alternate fields.  Of the 525 total raster
lines, 480 contain picture information.  The remainder comprise vertical
scanning overhead.

Television system engineers measure vertical resolution in units of
"cycles per picture height" (C/PH), where a cycle comprises a white
element and a black element.  C/PH is entirely comparable to the unit
which is used to describe film resolution: line pairs per millimetre
(often contracted to "lines/mm").  The maximum theoretical vertical
resolution contained in the 480 picture scan lines of television is 240
C/PH, corresponding to Nyquist's principle that at least two samples [in
this case, scan lines] are required to convey each cycle.

ACTUAL RESOLUTION

But just because you've got the samples doesn't mean that the full
theoretical reolution is being conveyed.  In the early days of television
a typical picture tube could resolve at best about two thirds (the "Kell"
factor) of the maximum theoretical vertical resolution, or about 160
C/PH.  This does not indicate that fewer lines are transmitted; rather, in
such a reduced-resolution system, the signal content of each scan line is
not completely independent, but is to some extent related to the content
of adjacent lines.  Also, not all of this theoretical resolution is
necessarily delivered to the face of the CRT:  a transmitted or recorded
signal may contain a pattern of 160 cycles vertically, but a particular
picture tube (CRT) which has poor focus or poor convergence may blend
these variations into invisibility, to result in an actual vertical
resolution less than 160 C/PH.

The aspect ratio of a 525-line television picture is 4:3, so equal
vertical and horizontal resolution are obtained (assuming a Kell factor of
2/3) at a horizontal resolution of 160 times 4/3, or 213 C/PW.  Multiply
this by 1.2 to accommodate horizontal scanning overhead to get 256, the
minimum number of cycles which must be conveyed per total line time to
obtain equal vertical and horizontal resolution.  Multiply this by the
horizontal (line) scanning rate of 15.734 kHz to get a bandwidth for video
of about 4 MHz.  This reasoning, combined with the monochrome television
channel spacing of 6 MHz, led the NTSC to choose a bandwidth of 4.2 MHz.
This will remain forever the limit for any over-the-airwaves NTSC signal.
Consumer equipment which exceeds this bandwidth is feasible but not yet
available.  Remember that this calculation assumes a Kell factor of 2/3;
this may no longer be an appropriate assumption.

"TELEVISION LINES"

Just like television markete(e)rs decided early on to exaggerate picture
size by stating the diagonal dimension of the screen rather than its width
or height, they state "resolution" in terms of equivalent television scan
lines, denoted by the abbreviation TVL ("television lines") rather than
C/PH.  There are two TVL per cycle:  think of a cycle as a white element
and a black element.  If a signal is sampled and represented digitally,
then each "TVL" is equivalent to one sample, so television system
engineers sometimes use the terms "samples per picture height"  or
"samples per picture width".

Actual resolution is measured optically by a calibrated wedge pattern of
black and white lines.  It is desirable that the same patttern, and the
same resolution number, apply to both the vertical and horizontal
directions.  Therefore, the TVL unit is used to measure horizontal
resolution as well.  Since the picture aspect ratio is 4:3, the
theoretical maximum 480 TVL of vertical resolution would be matched
horizontally by 640 samples.

One cycle per picture width consumes a time which is one total line time
[572/9 us], minus the FCC minimum blanking time [10.9 us].  This is the
duration in us corresponding to the picture width, and this is equivalent
to the number of cycles per picture width in the first 1 MHz of video
bandwidth.  Double this to get samples per picture width, and divide by
the picture aspect ratio to express this in units of [vertical] TVL.
Hence:

	((572/9)-10.9)*2*3/4

or about 79, is the number of TVL per MHz of bandwidth.

LIMITING RESOLUTION

The amplitude response of any electronic system generally falls off
gradually as a function of frequency.  The term "bandwidth" refers to the
frequency at which the signal amplitude has fallen to 50% ("-3 dB") of its
reference amplitude.  "Limiting resolution" in television is defined as
10% of the reference amplitude.  Limiting resolution is typically reached
at perhaps 1.2 times the 3 dB bandwidth.  Your factor may vary.

NTSC has a 3 dB bandwidth of 4.2 MHz, for a resolution (at 50%) of 332
TVL.  It could be argued that 10% limiting resolution could be a little
higher than this, but the in NTSC the sound subcarrier is at 4.5 MHz so it
is absolutely guaranteed that no resolution above 355 TVL is possible
over-the-airwaves.

"Advanced" or "improved" television technology, in particular frame rate
doubling (de-interlacing) at the display, can achieve very close to the
theoretical 480 TVL of vertical resolution (i.e. a Kell factor of unity),
and would benefit from horizontal resolution up to perhaps 700 TVL for
non-broadcast signals.  Broadcast studio equipment typically samples at
13.5 MHz, with 720 samples per picture width.  Baseband analog signals in
the studio typically have a bandwidth of 5.5 MHz, and the best 525-line
studio monitors are quoted as having 900 TVL of resolution at the centre
of the tube.

YUV REPRESENTATION (3 wires)

Studio equipment typically maintains colour signals in three components
YUV, which are easily derived from RGB.  The Y channel contains the
luminance (black-and-white) content of the image, and is computed as:   

	Y = 0.299 R + 0.587 G + 0.114 B  

"Colour difference" signals U and V are scaled versions of B-Y and R-Y
respectively; these vanish for monochrome (grey) signals.   The human
visual system has much less acuity for spatial variation of colour than
for luminance, and the advantage of U and V components is that each can be
conveyed with substantially less bandwidth than luminance, R or G or B.
In analog YUV studio systems, U and V each have a bandwidth of 1.5 MHz.
In digital systems, U and V are each horizontally subsampled by a factor
of two (i.e. conveyed at half the rate of Y).

Y/C REPRESENTATION (2 wires)

U and V can be combined easily into a "chroma" signal which is conveyed as
modulation of a continuous 3.58 MHz sine-wave subcarrier.  [This frequency
is exactly 455/2 times the line rate of 9/.572 kHz.]  The phase of the
chroma signal conveys a quantity related to hue, and its amplitude conveys
a quantity related to colour saturation (purity).  [Phase is decoded with
reference to a "burst" of the 3.58 MHz continuous-wave subcarrier which is
transmitted during the horizontal blanking interval.]  The "S" connector
simply carries Y and C on separate wires.  This coding is easily decoded
without artifacts.  Current S-VHS equipment conveys chroma with severely
limited bandwidth, about 300 kHz (which is just 16 cycles of U or V
per picture width).   Consumer VCR equipment has always recorded the
luminance and chroma components separately on tape, but only with the
introduction of the S-connector in S-VHS and ED-Beta equipment was the
consumer able to take advantage of this capability.

NTSC REPRESENTATION (1 wire)

The NTSC system mixes Y and C together and conveys the result on one piece
of wire.  The result of this addition operation is not theoretically
reversible:  the process of separating luminance and colour often confuses
one for the other.  Cross-colour artifacts result from luminance patterns
which happen to generate signals near the 3.58 MHz colour subcarrier.
Such information may be decoded as swirling colour rainbows.
Cross-luminance artifacts result if modulated colour information is
incorrectly decoded as crawling or hanging luminance dots.  It is these
artifacts which can be avoided by using the S-connector interface.  In
general, once the NTSC footprint is impressed on a signal, it persists
even if subsequent processing is performed in RGB or YUV components.

Encoded NTSC signals can be sampled into a stream of 8-bit bytes.  Such
"composite digital" systems have the advantage of using slightly less
memory than component systems, at the expense of the dreaded NTSC
artifacts.  Manipulation of such composite signals to perform operations
such as shrinking the picture is difficult or impossible, because if the
colour subcarrier frequency is altered the colour information in the
signal is destroyed.  Therefore, these operations are performed in the
component domain.

MEMORY REQUIREMENTS

[Nomenclature:  k=kilo=1000, K=2^10=1024, b=bit, B=Byte.]

About 210 KB (480-by-430), or 1.6 Mb, is sufficient to store composite
NTSC at a horizontal resolution of 320 TVL.  Y/C components can be stored
at S-VHS colour resolution in 256 KB (2 Mb).  Consumer equipment uses as
few as six bits for Y, U, or V.

Composite NTSC digital studio equipment typically stores a frame as
768-by-480 samples of 8 bits each, for about 384 KB (3 Mb) per frame.
Component digital equipment stores YUV components at 720-by-480 samples of
16 bits each for about 675 KB (5.4 Mb) per frame:  8-bit U and V colour
components are horizontally subsampled by a factor of two with respect to
luminance.


Charles Poynton
poynton@sun.com
(415)336-7846