[comp.windows.x] Luminance from RGB

poynton%vector@Sun.COM (Charles Poynton) (11/06/88)

In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R.
Swick <swick@ATHENA.MIT.EDU> comments:

> When converting RGB values to monochrome, the sample server(s) compute
> an intensity value as (.39R + .5G + .11B) ...

(.39R + .5G + .11B) is apparently incorrect.  This set could be a
typographical error (from .29R + .6G + .11B ?), a liberal approximation,
or perhaps an unusual phosphor set.  Could someone enlighten me on this?

In followup article <8811042303.AA21505@dawn.steinmetz.GE.COM>, Dick
St.Peters <stpeters@dawn.UUCP> makes the statement:

> I'd like to suggest that (.39R + .5G + .11B) is not a good choice for
> "intensity" in the realm of computer graphics.  ...
>
> A better choice in computer graphics is to equally weight the colors:
> ((R+G+B)/3.0).  Let white be white.

Equal weighting of the primaries is NOT the right thing to do, unless the
viewers of your images are members of some species that has uniform
response across the visible spectrum, unlike homo sapiens.

Humans see 1 watt of green light energy as being somewhat brighter than 1
watt of red, and very much brighter than 1 watt of blue.  The science of
colourimetry began to flourish in 1931, when the CIE standardized a
statistical entity called the "standard observer".  This includes a
standard spectral luminance response defined numerically as a function of
wavelength.  It is from this data that the factors which are used in
colour television are derived:  .587 for green, .299 for red, and 
.114 for blue.

The particular factors depend on the wavelengths or chromaticities
that you call red, green, and blue:  there is wide disparity in these
choices.  For computer graphics and television, the luminance factors
depend on the chromaticity coordinates of the phosphors of your CRT.
There are compromises in the choice of phosphor primaries, but it turns
out that the NTSC did a spectacularly good job of selecting primaries.
The luminance coefficients 0.299 for red, 0.114 for blue, and 0.587 for
green are unquestionably the best values to use, unless you know your
phoshphors intimately.

The second article continues, 
> 						The formula is from
> the (1954) NTSC standard for compatible color TV, and it has built
> into it a lot of compromises to accommodate old technology and
> problems inherent in the analog transmission of composite color
> television.

Contrary to this assertion, the ONLY compromise in NTSC which impacts the
luminance equation is the choice of reference phosphor chromaticities, and
a choice of phosphors MUST be made for any system which transmits colour
in RGB.  Just because it's old (1954) doesn't mean we should throw it
away.

Aside from this, the discussion of television coding which follows is
substantially correct, except that modulation onto an RF carrier for
transmission involves no inherent compromises beyond those already made in
formation of baseband NTSC.  (Receivers frequently make their own
compromises, but these are not inherent.)

For those interested, I attach an alternate description of television
coding.

Charles Poynton						"No quote.
poynton@sun.com						 No disclaimer."
(415)336-7846

-----

GAMMA CORRECTION

A picture tube (CRT) produces a light output which is proportional to its
input voltage raised to approximately the 2.5-th power.  Rather than
requiring circuitry to implement the 2.5-th root function to compensate
for this be implemented at every receiver, the "gamma correction" is
performed on the R, G, and B primaries at the camera to form signals
denoted R', G', and B'.

YUV REPRESENTATION (3 wires)

Studio equipment typically processes colour signals in three components
YUV, which are easily derived from RGB.  The Y channel contains the
luminance (black-and-white) content of the image, and is computed as:   

	Y' = 0.299 R' + 0.587 G' + 0.114 B'  

"Colour difference" signals U and V are scaled versions of B'-Y' and R'-Y'
respectively; these vanish for monochrome (grey) signals.   The human
visual system has much less acuity for spatial variation of colour than
for luminance, and the advantage of U and V components is that each can be
conveyed with substantially less bandwidth than luminance, R or G or B.
In analog YUV studio systems, U and V each have a bandwidth of 1.5 MHz,
compared to between 4.2 MHz and 5.5 MHz for luminance.  In digital
systems, U and V are each horizontally subsampled by a factor of two (i.e.
conveyed at half the rate of the luminance signal).

Y/C REPRESENTATION (2 wires)

U and V can be combined easily into a "chroma" signal which is conveyed as
modulation of a continuous 3.58 MHz sine-wave subcarrier.  Subcarrier
phase is decoded with reference to a sample or "burst" of the 3.58 MHz
continuous-wave subcarrier which is transmitted during the horizontal
blanking interval.  The phase of the chroma signal conveys a quantity
related to hue, and its amplitude conveys a quantity related to colour
saturation (purity).  The "S" connectors of S-VHS and ED-Beta equipment
simply carry Y and C on separate wires.  This coding is easily decoded
without artifacts.  Current S-VHS equipment conveys chroma with severely
limited bandwidth, about 300 kHz (which is just 16 cycles of U or V per
picture width).   Consumer VCR equipment has always recorded the luminance
and chroma components separately on tape, but only since the introduction
of the S-connector in S-VHS and ED-Beta equipment has the consumer been
able to take advantage of this capability.

NTSC REPRESENTATION (1 wire)

The NTSC system mixes Y and C together and conveys the result on one piece
of wire.  The result of this addition operation is not theoretically
reversible:  the process of separating luminance and colour often confuses
one for the other.  Cross-colour artifacts result from luminance patterns
which happen to generate signals near the 3.58 MHz colour subcarrier.
Such information may be decoded as swirling colour rainbows.
Cross-luminance artifacts result if modulated colour information is
incorrectly decoded as crawling or hanging luminance dots.  It is these
artifacts which can be avoided by using the S-connector interface.  In
general, once the NTSC footprint is impressed on a signal, it persists
even if subsequent processing is performed in RGB or YUV components.

Encoded NTSC signals can be sampled into a stream of 8-bit bytes.  Such
"composite digital" systems have the advantage of using slightly less
memory than component systems, at the expense of the dreaded NTSC
artifacts.  Manipulation of such composite signals to perform operations
such as shrinking the picture is difficult or impossible, because if the
colour subcarrier frequency is altered the colour information in the
signal is destroyed.  Therefore, these operations are performed in the
component domain.

FREQUENCY INTERLEAVING

The NTSC colour subcarrier frequency is chosen to be exactly 455/2 times
the line rate of 9/.572 kHz.  The fact that the subcarrier frequency is an
odd multiple of half the line rate causes colour information to be
interleaved with the luminance spectrum:  if a portion of a coloured
region has a positive-going modulated chroma component on one scan line,
then on the next line chroma will go negative.  This property allows the
use of a "comb filter" to separate luminance and chroma.  The signal is
delayed by one total line time, in order that two vertically adjacent
picture elements be available to the electronics at the same instant in
time.  Forming the sum of these two elements will produces luminance, and
forming their difference produces the modulated chroma.  This feature
results in greatly improved luma/chroma separation compared to a 3.58 MHz
"trap" filter.  However, a comb filter assumes a fair degree of vertical
correlation in the picture, and this assumption does not hold for pictures
with great vertical detail.

-----

poynton@SUN.COM (Charles Poynton) (11/07/88)

Excuse the re-post, the first posting was returned by a mailer at MIT so I
think they missed it.

In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R.
Swick <swick@ATHENA.MIT.EDU> comments:

> When converting RGB values to monochrome, the sample server(s) compute
> an intensity value as (.39R + .5G + .11B) ...

(.39R + .5G + .11B) is apparently incorrect.  This set could be a
typographical error (from .29R + .6G + .11B ?), a liberal approximation,
or perhaps an unusual phosphor set.  Could someone enlighten me on this?

In followup article <8811042303.AA21505@dawn.steinmetz.GE.COM>, Dick
St.Peters <stpeters@dawn.UUCP> makes the statement:

> I'd like to suggest that (.39R + .5G + .11B) is not a good choice for
> "intensity" in the realm of computer graphics.  ...
>
> A better choice in computer graphics is to equally weight the colors:
> ((R+G+B)/3.0).  Let white be white.

Equal weighting of the primaries is NOT the right thing to do, unless the
viewers of your images are members of some species that has uniform
response across the visible spectrum, unlike homo sapiens.

Humans see 1 watt of green light energy as being somewhat brighter than 1
watt of red, and very much brighter than 1 watt of blue.  The science of
colourimetry began to flourish in 1931, when the CIE standardized a
statistical entity called the "standard observer".  This includes a
standard spectral luminance response defined numerically as a function of
wavelength.  It is from this data that the factors which are used in
colour television are derived:  .587 for green, .299 for red, and 
.114 for blue.

The particular factors depend on the wavelengths or chromaticities
that you call red, green, and blue:  there is wide disparity in these
choices.  For computer graphics and television, the luminance factors
depend on the chromaticity coordinates of the phosphors of your CRT.
There are compromises in the choice of phosphor primaries, but it turns
out that the NTSC did a spectacularly good job of selecting primaries.
The luminance coefficients 0.299 for red, 0.114 for blue, and 0.587 for
green are unquestionably the best values to use, unless you know your
phoshphors intimately.

The second article continues, 
> 						The formula is from
> the (1954) NTSC standard for compatible color TV, and it has built
> into it a lot of compromises to accommodate old technology and
> problems inherent in the analog transmission of composite color
> television.

Contrary to this assertion, the ONLY compromise in NTSC which impacts the
luminance equation is the choice of reference phosphor chromaticities, and
a choice of phosphors MUST be made for any system which transmits colour
in RGB.  Just because it's old (1954) doesn't mean we should throw it
away.

Aside from this, the discussion of television coding which follows is
substantially correct, except that modulation onto an RF carrier for
transmission involves no inherent compromises beyond those already made in
formation of baseband NTSC.  (Receivers frequently make their own
compromises, but these are not inherent.)

For those interested, I attach an alternate description of television
coding.

Charles Poynton						"No quote.
poynton@sun.com						 No disclaimer."
(415)336-7846

-----

GAMMA CORRECTION

A picture tube (CRT) produces a light output which is proportional to its
input voltage raised to approximately the 2.5-th power.  Rather than
requiring circuitry to implement the 2.5-th root function to compensate
for this be implemented at every receiver, the "gamma correction" is
performed on the R, G, and B primaries at the camera to form signals
denoted R', G', and B'.

YUV REPRESENTATION (3 wires)

Studio equipment typically processes colour signals in three components
YUV, which are easily derived from RGB.  The Y channel contains the
luminance (black-and-white) content of the image, and is computed as:   

	Y' = 0.299 R' + 0.587 G' + 0.114 B'  

"Colour difference" signals U and V are scaled versions of B'-Y' and R'-Y'
respectively; these vanish for monochrome (grey) signals.   The human
visual system has much less acuity for spatial variation of colour than
for luminance, and the advantage of U and V components is that each can be
conveyed with substantially less bandwidth than luminance, R or G or B.
In analog YUV studio systems, U and V each have a bandwidth of 1.5 MHz,
compared to between 4.2 MHz and 5.5 MHz for luminance.  In digital
systems, U and V are each horizontally subsampled by a factor of two (i.e.
conveyed at half the rate of the luminance signal).

Y/C REPRESENTATION (2 wires)

U and V can be combined easily into a "chroma" signal which is conveyed as
modulation of a continuous 3.58 MHz sine-wave subcarrier.  Subcarrier
phase is decoded with reference to a sample or "burst" of the 3.58 MHz
continuous-wave subcarrier which is transmitted during the horizontal
blanking interval.  The phase of the chroma signal conveys a quantity
related to hue, and its amplitude conveys a quantity related to colour
saturation (purity).  The "S" connectors of S-VHS and ED-Beta equipment
simply carry Y and C on separate wires.  This coding is easily decoded
without artifacts.  Current S-VHS equipment conveys chroma with severely
limited bandwidth, about 300 kHz (which is just 16 cycles of U or V per
picture width).   Consumer VCR equipment has always recorded the luminance
and chroma components separately on tape, but only since the introduction
of the S-connector in S-VHS and ED-Beta equipment has the consumer been
able to take advantage of this capability.

NTSC REPRESENTATION (1 wire)

The NTSC system mixes Y and C together and conveys the result on one piece
of wire.  The result of this addition operation is not theoretically
reversible:  the process of separating luminance and colour often confuses
one for the other.  Cross-colour artifacts result from luminance patterns
which happen to generate signals near the 3.58 MHz colour subcarrier.
Such information may be decoded as swirling colour rainbows.
Cross-luminance artifacts result if modulated colour information is
incorrectly decoded as crawling or hanging luminance dots.  It is these
artifacts which can be avoided by using the S-connector interface.  In
general, once the NTSC footprint is impressed on a signal, it persists
even if subsequent processing is performed in RGB or YUV components.

Encoded NTSC signals can be sampled into a stream of 8-bit bytes.  Such
"composite digital" systems have the advantage of using slightly less
memory than component systems, at the expense of the dreaded NTSC
artifacts.  Manipulation of such composite signals to perform operations
such as shrinking the picture is difficult or impossible, because if the
colour subcarrier frequency is altered the colour information in the
signal is destroyed.  Therefore, these operations are performed in the
component domain.

FREQUENCY INTERLEAVING

The NTSC colour subcarrier frequency is chosen to be exactly 455/2 times
the line rate of 9/.572 kHz.  The fact that the subcarrier frequency is an
odd multiple of half the line rate causes colour information to be
interleaved with the luminance spectrum:  if a portion of a coloured
region has a positive-going modulated chroma component on one scan line,
then on the next line chroma will go negative.  This property allows the
use of a "comb filter" to separate luminance and chroma.  The signal is
delayed by one total line time, in order that two vertically adjacent
picture elements be available to the electronics at the same instant in
time.  Forming the sum of these two elements will produces luminance, and
forming their difference produces the modulated chroma.  This feature
results in greatly improved luma/chroma separation compared to a 3.58 MHz
"trap" filter.  However, a comb filter assumes a fair degree of vertical
correlation in the picture, and this assumption does not hold for pictures
with great vertical detail.

-----

stpeters@dawn.UUCP (11/08/88)

Received: by ATHENA.MIT.EDU (5.45/4.7) id AA13335; Mon, 7 Nov 88 18:22:34 EST
Received: from steinmetz.UUCP by uunet.UU.NET (5.59/1.14) with UUCP 
	id AA18816; Mon, 7 Nov 88 18:20:39 EST
From: steinmetz!dawn!stpeters@uunet.UU.NET
Received:  from dawn.steinmetz.GE.COM (dawn.ARPA)
	by kbsvax.steinmetz (1.2/1.1x Steinmetz)
	 id AA10218; Mon, 7 Nov 88 17:58:04 est
Received: by dawn.steinmetz.GE.COM (4.0/SMI-4.0)
	id AA23325; Mon, 7 Nov 88 17:57:55 EST
Date: Mon, 7 Nov 88 17:57:55 EST
Message-Id: <8811072257.AA23325@dawn.steinmetz.GE.COM>
To: xpert@athena.mit.edu
Cc: stpeters@dawn.steinmetz, poynton@sun.com
Subject: Re: Luminance from RGB (was "intensity" from RGB

Responding to my posting on RGB, Charles Poynton (poynton@sun.com)
writes:
> ... except that modulation onto an RF carrier for
> transmission involves no inherent compromises beyond those already made in
> formation of baseband NTSC.

This isn't an appropriate forum for RF discussion, but one paragraph
won't hurt.  In baseband NTSC, the chroma signal is an oscillation
about the luma level.  The NTSC scaling of chroma relative to luma is
such that when the composite signal is (AM) modulated onto RF, certain
combinations of luma and chroma nominally correspond to more than 100%
modulation.  That's just one example: there *is* a substantial set of
inherent compromises involved going to/from RF[1].

On RGB weighting:

I argued that the NTSC intensity formula shouldn't be considered
carved in stone for purposes beyond its original intent, because
compatible color TV required identification of an RGB combination
orthogonal to chroma, a luma that had to work with B&W televisions.
Mr. Poynton correctly stated that the formula also describes the human
spectral response.  That the best RGB weightings for a luma orthogonal
to chroma turned out to be those for the human spectral response
should hardly surprise anyone.

However, that has little relevance[2].  If you display RGB in the
ratios matching the human spectral response, you do not get a good
white, you get a bright but dreary grey.  The logarithmic response of
the eye masks the roughly 2:1 green/red ratio, but the 11% blue in the
formula is just not enough.

The CIE has a name that I can't remember, a qualified white of some
sort, for this bright grey.  The mixture lies in the large central
region of whiteish shades in a triangular color plot, but it is well
removed from what people subjectively perceive as a clean, pure white.

This has been known and exploited for years.  Fabric whiteners work by
adding blue; they're essentially blue dyes.  High-grade paper is
tinted with fluorescent blue dyes to make it appear whiter; people
appearing on color TV have long been advised that a blue-tinted shirt
looks whiter than a white one (a problem aggravated by the low level
of blue in the spectrum of studio lighting).

The point is that what people perceive as pure white is not, in TV
terms, a zero-chroma color.  This is not unnatural: there is no reason
that we should have evolved with our visual systems adjusted so that
the mixture we see as white matches the weightings determined by the
spectral response of our eyes.  After all, the illumination spectrum
in which we see things is not constant over frequency - nor is the
variation constant over time.  (E.g., there is more blue light on
clear days than overcast ones, more blue at noon than near sunset or
dawn, more blue on clear days near the equator than on clear days at
higher latitudes, etc.)  To restate the point, not only is there no
reason we should have evolved to see NTSC luma as pure white, it is
known that we did not.

[I'm rather fond of summer scenes of bright white puffy clouds in a
blue sky.  If nature had adjusted my visual system to perceive the
NTSC mixture as pure white, the clouds would seem to have a blue tint.
It wouldn't be the same.]

Not long from now, 24-bit systems will be as common as 8-bit ones are
today, and people will want to display true-color images.  If you
display the NTSC RGB combination and adjust your monitor so it looks
like a nice clean white, you will have problems with true-color
images.  If you make equal-RGB look white, your true-color images will
look true-color.

When you have RGB on separate cables with equal bandwidths, there is
nothing relevant about the NTSC RGB combination.  However, you will
frequently want white window borders, background/foreground color,
etc.  The white you want is ((R+G+B)/3).  Equal weighting and the
eye's logarithmic response allow considerable monitor misadjustment to
still make your white look reasonably white.

--
Dick St.Peters                        
GE Corporate R&D, Schenectady, NY
stpeters@ge-crd.arpa              
uunet!steinmetz!stpeters

1. I'm willing to discuss RF issues further offline.

2. The NTSC/human-vision weighting is a reasonable one if you want to
   extract a grey-scale image from a true-color image.

poynton@SUN.COM (Charles Poynton) (11/09/88)

This follows Dick St Peters' comment in <8811080030.AA14681@EXPO.LCS.MIT.EDU>
of Comp.windows.x that (R+G+B)/3 is an appropriate weighting 
function for luminance.

Summary

Luminance weighting coefficients are a function ONLY of the human
luminance sensitivity functions, the selected phosphor chromaticities, and
the chromaticity of the reference illuminant (white point).  The equations
can be found on page 2.28 and on in K. Blair Benson's "Television
Engineering Handbook", or on page 20-10 of Donald G. Fink's Electronic
Engineers' Handbook, Second Edition".

A Simple Thought Experiment

Think of three monochromatic primaries, for example, tuneable dye lasers.
If (R+G+B)/3 is white, what happens when I move the wavelength of the blue
primary?  If I move it towards the ultraviolet, the eye responds less to
it and luminance drops.  If I move it towards green, the eye responds more
(in which channel(s) we don't care) and its luminance increases.

Hence, the equation for luminance must depend in some manner on what
wavelength you call "blue".

White

CIE Illuminant D65 refers to a standard spectral distribution which is
chosen to approximate daylight.  This is television's standard white
reference.  So-called "Zero chroma" in NTSC occurs on this colour.  People
may or may not respond to this as "white":  if the observer is adapted to
a pink room, he'll think it's yellowish-green.  The reason that the CIE
decided to standardize white, in 1931, is to allow work in colourimetry to
be done objectively.

To achieve accurate colour reproduction requires specification of
reference white.  This is a lurking problem for accurate colour
reproduction in computer graphics, because most workstation monitors are
adjusted for a white point (of about 9300 K) that's quite a bit more blue
than the television standard.  Software people tend to say
"R=G=B=1=white.  Simple."  But it's not that simple.

Blue Shirts

Sorry to air the laundry in public.  Fabric whiteners work by adding
materials which flouresce to convert ultraviolet light [to which the human
eye is not sensitive] into visible light in the blue part of the spectrum
[to which it is].  The shirt not only looks brighter than white, it IS
brighter than white.  But this has nothing to do with the luminance
coefficients.

With modern phosphors the blue luminances coefficient is actually tending
to go DOWN.  To produce accurate reproduction with European standard
phosphors requires a blue luminance contribution of 0.071.  This is a big
colour problem for Europe, because they use the same luminance function as
we do, and it's not exactly matched to their phosphors.  They tweak their
camera spectral sensitivities as a first-order fix to improve the colour
reproduction accuracy.  

Orthogonality

Luminance is luminance; chrominance is chrominance; and NTSC can be
thought of as conveying Y, U, and V independently.  This is true
regardless of your interpretation of what these quantities represent.
They're "orthogonal" provided that one can't be derived from the other
two.  Although there are some subtle signal-to-noise ratio considerations
in television coding, this issue is independent of (or should I say
orthogonal to) the choice of luminance coefficients.

NTSC RF Modulation

I stand corrected by Mr St Peters on an RF modulation point:  there IS one
compromise made in NTSC transmission via RF.  White modulates the RF
carrier to 12.5%, and black modulates it to 70.3125%.  When NTSC modulates
an RF carrier, chroma excursions near highly saturated yellow and near
highly saturated cyan are clipped to 120 IEEE units prior to the
modulator, to avoid UNDER-modulating the transmitter.  Two small regions
of colour space are lost in this case.  No practical television camera has
sufficient colour separation capability to generate signals in these
regions, but electronically-synthesized colour bars have perfect colour
saturation and would undermodulate the transmitter if left alone.  Studio
colour bars are generated at 75% saturation to avoid these two regions of
colour space.  This issue does not bear on the choice of luminance
coefficients, and is relevant to broadcast only.  VHS machines and NTSC
monitors reproduce all of the NTSC colour space even at 100% saturation.

I would have put this first, but then you wouldn't have read all the
colour stuff, would you?

Charles

poynton%vector@Sun.COM (Charles Poynton) (11/09/88)

This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU>
of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for
luminance.

Summary

Luminance weighting coefficients are a function ONLY of the human
luminance sensitivity functions, the selected phosphor chromaticities, and
the chromaticity of the reference illuminant (white point).  The equations
can be found on page 2.28 and on in K. Blair Benson's "Television
Engineering Handbook", or on page 20-10 of Donald G. Fink's Electronic
Engineers' Handbook, Second Edition".

A Simple Thought Experiment

Think of three monochromatic primaries, for example, tuneable dye lasers.
If (R+G+B)/3 is white, what happens when I move the wavelength of the blue
primary?  If I move it towards the ultraviolet, the eye responds less to
it and luminance drops.  If I move it towards green, the eye responds more
(in which channel(s) we don't care) and its luminance increases.

Hence, the equation for luminance must depend in some manner on what
wavelength you call "blue".

White

CIE Illuminant D65 refers to a standard spectral distribution which is
chosen to approximate daylight.  This is television's standard white
reference.  So-called "Zero chroma" in NTSC occurs on this colour.  People
may or may not respond to this as "white":  if the observer is adapted to
a pink room, he'll think it's yellowish-green.  The reason that the CIE
decided to standardize white, in 1931, is to allow work in colourimetry to
be done objectively.

To achieve accurate colour reproduction requires specification of
reference white.  This is a lurking problem for accurate colour
reproduction in computer graphics, because most workstation monitors are
adjusted for a white point (of about 9300 K) that's quite a bit more blue
than the television standard.  Software people tend to say
"R=G=B=1=white.  Simple."  But it's not that simple.

Blue Shirts

Sorry to air the laundry in public.  Fabric whiteners work by adding
materials which flouresce to convert ultraviolet light [to which the human
eye is not sensitive] into visible light in the blue part of the spectrum
[to which it is].  The shirt not only looks brighter than white, it IS
brighter than white.  But this has nothing to do with the luminance
coefficients.

With modern phosphors the blue luminances coefficient is actually tending
to go DOWN.  To produce accurate reproduction with European standard
phosphors requires a blue luminance contribution of 0.071.  This is a big
colour problem for Europe, because they use the same luminance function as
we do, and it's not exactly matched to their phosphors.  They tweak their
camera spectral sensitivities as a first-order fix to improve the colour
reproduction accuracy.  

Orthogonality

Luminance is luminance; chrominance is chrominance; and NTSC can be
thought of as conveying Y, U, and V independently.  This is true
regardless of your interpretation of what these quantities represent.
They're "orthogonal" provided that one can't be derived from the other
two.  Although there are some subtle signal-to-noise ratio considerations
in television coding, this issue is independent of (or should I say
orthogonal to) the choice of luminance coefficients.

NTSC RF Modulation

I stand corrected by Mr St Peters on an RF modulation point:  there IS one
compromise made in NTSC transmission via RF.  White modulates the RF
carrier to 12.5%, and black modulates it to 70.3125%.  When NTSC modulates
an RF carrier, chroma excursions near highly saturated yellow and near
highly saturated cyan are clipped to 120 IEEE units prior to the
modulator, to avoid UNDER-modulating the transmitter.  Two small regions
of colour space are lost in this case.  No practical television camera has
sufficient colour separation capability to generate signals in these
regions, but electronically-synthesized colour bars have perfect colour
saturation and would undermodulate the transmitter if left alone.  Studio
colour bars are generated at 75% saturation to avoid these two regions of
colour space.  This issue does not bear on the choice of luminance
coefficients, and is relevant to broadcast only.  VHS machines and NTSC
monitors reproduce all of the NTSC colour space even at 100% saturation.

I would have put this first, but then you wouldn't have read all the
colour stuff, would you?

Charles

awpaeth@watcgl.waterloo.edu (Alan Wm Paeth) (11/10/88)

In article <76649@sun.uucp> poynton%vector@Sun.COM (Charles Poynton) writes:
>This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU>
>of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for
>luminance.

Thank you, Charles for a very informative article. I (for one) I'm getting
tired of hearing misinformation such the above comment or related statements
such as "[CMY] = 1-[RGB]". It's nice to see the record being set straight.
(This is not a criticism on the original poster).

These color "facts" have a feeling of common sense (as 0th order approximations
to reality) which explains why they are deduced from first principles and then
readily requoted. A bit like teaching Newtonian physics without being reminded
that Relativity exists. Fortunately, the latest Computer Graphics texts are
beginning to give color the more precise treatment it deserves.

Some additional fine points which might be of interest to people:

>White
>
>To achieve accurate colour reproduction requires specification of
>reference white.  This is a lurking problem for accurate colour
>reproduction in computer graphics, because most workstation monitors are
>adjusted for a white point (of about 9300 K) that's quite a bit more blue
>than the television standard.  Software people tend to say
>"R=G=B=1=white.  Simple."  But it's not that simple.

When people say "RGB color" my first impressions are (1) color used in the
context of a digital computer and (2) (all to often) an associated sense
vagueness about the exactness of the specification ("well, it's 24 bit RGB?!").

For instance, R might relate to the phosphor chromaticity of a specific
monitor, a spectral line, or the NTSC defined value (I've yet to see an
NTSC monitor that reproduces this). In our lab I've encountered many "R"'s:

    Chromaticity Coords		Comments
    -------------------------------------------------
    x' = .62, y' = .21		Electrohome monitor red phosphor
    x' = .65, y' = .30		Aydin monitor red phosphor 
    x' = .63, y' = .30		Spectral line at 600 Angstroms (CIE tables)
    x' = .67, y' = .21		NTSC defined "Red"

The use of the NTSC standard (with which the CIE tables for Y and a matrix
inversion give the familiar Y = .299B + .587G + .114B) is far better than using
Y = 1/3(R+G+B), but even then, I've yet to see an NTSC monitor. The first
color TV's tried to live up to the standard, but the required red spectral
purity is so high (that's the x' = .67 in the table) that luminance suffers.
So the industry pushes brighter, less pure red phosphors (remember the
"rare earth phosphor" ads of the early 70's?) and it's anyone's guess where
the R coordinates are. Unless you have a lot of money, studio monitors tend to
migrate in the direction of their commercial counterparts (lower spectral
purity in the phosphors).

>
>Orthogonality
>
>Luminance is luminance; chrominance is chrominance; and NTSC can be
>thought of as conveying Y, U, and V independently.  This is true
>regardless of your interpretation of what these quantities represent.
>They're "orthogonal" provided that one can't be derived from the other
>two.  Although there are some subtle signal-to-noise ratio considerations
>in television coding, this issue is independent of (or should I say
>orthogonal to) the choice of luminance coefficients.

Well, that's really a definition for independence, not complete orthogonality.
The YIQ television signal is similar to the CIE defined YUV (or XYZ) color
spaces in that the Y's (luminance) are the same. The I and Q chromanence
signals pick up the remaining two degrees of freedom. The matrix that defines
the coordinate change was chosen out of bandwidth considerations. In fact,
I and Q stand for "In phase" and "Quadrature" signal. They are encoded for
broadcast on color subcarriers that are 90 degrees out of phase. It is these
signals that are orthogonal (in the sin(x), cos(x) sense), but not the
independent values which they encode.

>NTSC RF Modulation
>
>I stand corrected by Mr St Peters on an RF modulation point:  there IS one
>compromise made in NTSC transmission via RF...When NTSC modulates
>an RF carrier, chroma excursions near highly saturated yellow and near
>highly saturated cyan are clipped to 120 IEEE units prior to the
>modulator, to avoid UNDER-modulating the transmitter.  Two small regions
>of colour space are lost in this case.  No practical television camera has
>sufficient colour separation capability to generate signals in these
>regions, but electronically-synthesized colour bars have perfect colour
>saturation and would undermodulate the transmitter if left alone.

Almost. There are stories of independent stations with substandard power
supplies that got fratzed when trying to run _Sesame Street_ clips --
Big Bird is both big and yellow (the AM video signal draws power as a function
of the modulation). As for the generation of "synthetic" colors as with color
bar tests, one has to be careful. I worked on Shoup's "Superpaint" system when
with Xerox (one of the first color "paint" systems. It provided NTSC output).
It featured a menu item to test for "hot" broadcast colors such as yellow to
avoid modulation problems -- it would blink the colors. In that case one had
either to reduce the intensity or desaturate.

A tool like this is useful and necessary for readying computer graphics images
for commercial broadcast. Those images *invariably* have highly saturated
colors (a generation raised on Big Bird?). If there is enough interest I can
post a production tuned C program which flags "hot" colors.

    /Alan Paeth
    Computer Graphics Laboratory
    University of Waterloo

mab@pixar.UUCP (Malcolm Blanchard) (11/12/88)

The discussion of luminance computations and the subsequent discussion
of the meaning of white reminds me of an experience I had a few years ago
when Pixar was a division of Lucasfilm and we were working on an effect
for "Young Sherlock Holmes".  Aesthetic decisions were being made by
people sitting in front of color monitors.  The digital images were
transferred to film using three color lasers.  The film was printed and
then projected in a screening room.  I decided that this was an great
place to implement a what-you-see-is-what-you-get color system.  And
so I delved into the murky depths of colorimetry in the hope of
developing a color-correction program that would produce
the same color in the screening room that was measured on the color
monitors.  This a difficult problem (in fact, in its strictest sense, an
impossible one, since the color gamuts of the two systems have mutually
exclusive regions).  I took into account the CIE coordinates of the
monitor's phosphors, its color balance, the sensitivity of the color film
to each of the lasers, cross-talk between the film layers, effects of
film processing, the spectral characteristics of the print film's dye
layers, and the spectral characteristics of a standard projector
bulb.  Several steps in this process are extremely non-linear, but I
was able to achieve some good results by using some piece-wise linear
approximations.  I felt a great sense of success when I used a colormeter
to confirm that the CIE coordinates on the silver screen did, indeed,
closely match those on the tiny screen.  We color corrected a few shots
and showed them to the effects director.  He's response was, "Why does
this look so blue"?  It turns out that when we look at a TV we're accustomed
to a blue balance and when we're sitting in a theater we expect a yellow
balance.  The digital color correction was abandoned and the production
relied on the film lab to produce an aesthetic balance.  Thus proving
to me that science may work, but computer graphics and film making are
still largely a matter of art.

stpeters@dawn.UUCP (11/14/88)

pixar!mab@bloom-beacon.mit.edu  (Malcolm Blanchard) writes:
> The digital color correction was abandoned and the production
> relied on the film lab to produce an aesthetic balance.  Thus proving
> to me that science may work, but computer graphics and film making are
> still largely a matter of art.

Amen.

I am weary of claims that the CIE description of 1931, as pivotal as
it was, is the last word in human color perception.  Does it really
suprise anybody that work over the last half century has unearthed
some aspects that the CIE did not?

CIE is a comfort to engineers, because it wraps color in a widely
accepted veneer of quantification, making it seem like a science,
something they can deal with.  It allows them to talk about color in
terms of "CIE coordinates", quoted to two or more significant figures,
as for phosphors in a recent posting.

However, the CIE coordinates are strictly applicable only to truly
continuous spectra.  Phosphors emit much of their output in a few
narrow spectral lines.  Further, the proportions of the emission among
the peaks - and the continuous underlying background - depend on the
excitation.  Assigning precise CIE coordinates to such phosphors is
meaningless.  (Note the word "precise".)

Human vision receives input in four channels.  Three have spectral
response curves that peak in red, green, and blue parts of the
spectrum respectively, but each responds somewhat throughout most of
the visible spectrum.  The fourth channel is the roughly colorless
night vision channel, responding to overall brightness.

Our visual system appears to assign color based on ratios of the
inputs - and on its own biases.  It is an ornery system indeed and is
quite willing to reject input that does not fit: when two images of a
scene, each projected through a different narrowband red filter, are
superimposed on a screen, it is quite possible for humans to perceive
greens and blues in a scene, in spite of information from the eye's
receptors that the overwhelming majority of received light energy is
in the red part of the spectrum[1].

As far as I know, nobody knows why for sure.  However, apparently our
visual system (evolved for broad-spectrum illumination) has more
confidence in the differing responses of the blue, green and
"colorless" receptors to the two red peaks than in the evidence from
the single red channel.  With inputs voting 3:1 in favor of other
colors, it ignores the red input.

If you had a red phosphor that emitted primarily in two spectral peaks
and the proportional distribution between the peaks depended on the
excitation, you could perceive the emission from the red phosphor as
being blue or green at some intensities.  (The emission would have to
be confined virtually entirely to the peaks, whereas real phosphors
emit a continuous background as well as peaks, so such a phosphor
would be very unlikely.)

Not only does the concentration by phosphors of their output into a
few peaks distort the color response they generate from their nominal
CIE coordinates, but the spectrum of each phosphor interacts with that
of the others in an RGB system, probably by altering response ratios.

Two projects ago, I spent several years simulating color television
systems, from the camera to NTSC baseband to RF to NTSC baseband to
final "RGB", as well as simulating high-definition TV schemes.  I know
all too well how hard it is to give up the comfort of quantifying
color.  CIE was a good start, a necessary and useful approximation for
technological progress like color TV transmission and color CRT
development.  However, "accurate" color is still an art.

--
Dick St.Peters                        
GE Corporate R&D, Schenectady, NY
stpeters@ge-crd.arpa              
uunet!steinmetz!stpeters

1. The experiment requires that each (B&W) slide be projected through
   the same filter through which it was photographed.

xacct@uhccux.uhcc.hawaii.edu (X-Windows Account) (11/14/88)

From article <8811132229.AA02726@dawn.steinmetz.GE.COM>, by stpeters@dawn.UUCP:

" 
" If you had a red phosphor that emitted primarily in two spectral peaks
" and the proportional distribution between the peaks depended on the
" excitation, you could perceive the emission from the red phosphor as
" being blue or green at some intensities.  (The emission would have to
" be confined virtually entirely to the peaks, whereas real phosphors
" emit a continuous background as well as peaks, so such a phosphor
" would be very unlikely.)

It might also have to part of a high resolution picture of a
natural scene, if I recall Land's experiments correctly.  I noticed
that you used the term 'scene' above the quoted passage.

	Greg, lee@uhccux.uhcc.hawaii.edu

raveling@vaxb.isi.edu (Paul Raveling) (11/16/88)

In article <8811132229.AA02726@dawn.steinmetz.GE.COM> stpeters@dawn.UUCP writes:
>
>Our visual system appears to assign color based on ratios of the
>inputs - and on its own biases.  It is an ornery system indeed and is
>quite willing to reject input that does not fit: ...

	A demonstration of this that I used to do with our filled-vector
	map graphics involved changing the ocean color.  Starting
	with a seemingly well-saturated chocolate brown for land
	and dark blue for water, I'd change only the water color.
	Going to light blue made the land suddenly appear black.
	Holding something over the screen to mask out water areas
	showed that the land, in fact, was still the same "vivid"
	chocolate brown.

	This was a fairly radical difference in perceived color
	and luminance.  An "accurate" model of perception would need
	to incorporate info about graphic context of the color
	in question.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

karlton@decwrl.dec.com (Philip Karlton) (12/06/88)

I found the entire discussion on luminance last month quite interesting. What
I would like now from the experts is what they would do to convert an RGB
value as specified in the X protocol into a gray level.

The particular problem I have in mind is that of a StaticGray display with N
equally spaced intensities arranged in a ramp with black at 0 and white at
N-1. I have access to hardware with N of 2, 4, 16, and 256.

Two different expressions for computing the appropriate pixel value come
immediately to (my) mind:

For r, g, b in [0..1]

	floor((.299r + .587g + .114b)(n - 1) + 0.5)		(a)

or for r, g, b in [0..1)

	floor((.299r + .587g + .114b)(n))			(b)

(a) and (b) produce almost identical results for N=2. For N=256, the resulting
differences are probably not detectable by the human eye, certainly not mine.
For N=4, the differences are observable.

The correct thing is for the client to have done the appropriate dithering and
present the pixmap to the server. For those clients that ignore the visual
type of the root window, the server has to do the mapping of RGB (in X's
terms) to some pixel value. Is either (a) or (b) the appropriate choice. Is
some better function around that I should use?

For the numerically curious: r, g, and b above could be computed using

	r = ((float) screenRed) / maxColor;
	b = ((float) screenBlue) / maxColor;
	g = ((float) screenGreen) / maxColor;

where maxColor is dependent upon which of (a) or (b) is chosen:

	float maxColor = (float) (0xFFFF);		/* (a) */
or
	float maxColor = (float) (0x10000);		/* (b) */

PK

srneely@watcgl.waterloo.edu (Shawn Neely) (12/07/88)

In article <964@bacchus.dec.com> karlton@decwrl.dec.com (Philip Karlton) writes:

+I would like now from the experts is what they would do to convert an RGB
+value as specified in the X protocol into a gray level.
+
+The particular problem I have in mind is that of a StaticGray display with N
+equally spaced intensities arranged in a ramp with black at 0 and white at
+N-1. I have access to hardware with N of 2, 4, 16, and 256.
+
+Two different expressions for computing the appropriate pixel value come
+immediately to (my) mind:
+
+For r, g, b in [0..1]
+
+	floor((.299r + .587g + .114b)(n - 1) + 0.5)		(a)
+
+or for r, g, b in [0..1)
+
+	floor((.299r + .587g + .114b)(n))			(b)
+
+ (stuff deleted)
+...Is either (a) or (b) the appropriate choice. Is
+some better function around that I should use?
+
+For the numerically curious: r, g, and b above could be computed using
+
+	r = ((float) screenRed) / maxColor;
+	b = ((float) screenBlue) / maxColor;
+	g = ((float) screenGreen) / maxColor;
+
+where maxColor is dependent upon which of (a) or (b) is chosen:
+
+	float maxColor = (float) (0xFFFF);		/* (a) */
+or
+	float maxColor = (float) (0x10000);		/* (b) */
+
+PK

The correct approach is (a). A strong argument for using
the closed interval [0..1] (and [0..N-1]) is given in
    "Design and Experience with a Generalized Raster Toolkit"
    by Paeth and Booth in Proc. Graphics Interface '86 (Vancouver).

The interval is consistent with the design of a number of colour spaces,
and is correct when data is taken to higher significance. The same is
not true of the wrong (often implicit using bit shifts) use of
the open interval [0..1).

For example, a one-bit image in the [0..1) model allows only
the intensity values 0.0 and 0.5, and not "full on".

The required multiplications and divisions for the correct approach
can often be performed by table lookup.
-- 
(.I.)   "The road of excess leads
 ).(       to the palace of wisdom."
( Y )              -William Blake

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/10/88)

In article <7162@watcgl.waterloo.edu> srneely@watcgl.waterloo.edu (Shawn Neely) writes:
[...discussion of RGB->Grey conversion and "shifting" to higher precision...]
|
|The interval is consistent with the design of a number of colour spaces,
|and is correct when data is taken to higher significance. The same is
|not true of the wrong (often implicit using bit shifts) use of
|the open interval [0..1).
|
|For example, a one-bit image in the [0..1) model allows only
|the intensity values 0.0 and 0.5, and not "full on".
|
|The required multiplications and divisions for the correct approach
|can often be performed by table lookup.

Over the last couple of weeks I've been working with the PBM code posted to
(i think) comp.source.misc, expanding it to handle 8-bit greyscale and
24-bit color images.  In the process, I've run into some of the problems
being discussed here.  Following are some of my solutions.  I think they
work as well or better than what I've seen posted so far, but if I'm
missing some glaring deficiency, I'd like to have it pointed out to me.

RGB to Greyscale:
  I use the formula GREY = ((76 * R) + (150 * G) + (29 * B)) >> 8;
where R, G and B are 24-bit color components [0..255] and the result
is a greyscale value [0..255] with intermediate values in the formula
not exceeding 16 bits (ie. int).  This gives a good approximation of
the 29.9% red, 58.7% green, 11.4% blue luminence contributions.

Extrapolation to more significance:
  I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those
color values to their 8-bit equivalents.  As was mentioned, simply doing
a left-shift is not sufficient.  The method I use is to start at the MSB
of the source and destination values, copy bits from the source proceeding
toward the LSB, if you reach the end of the source before filling the
destination, start over at the beginning of the source.  This works for
both imcreasing and decreasing significance (equivalent to right-shift for
decreasing).  Example: 101 --> 10110110, 000->00000000, 111->11111111, etc.
It seems to work for all cases, even wierd things like 7-bits -> 13-bits.

One problem I have yet to solve it analyzing a picture to choose N colors
out of a (larger) palette of M colors that best represent a given image.
For example, I can only use 16-colors out of a palette of 512, so which
are the "best" 16 to use.  I already have color dithering algorithms, but
I need to decide which colors to dither WITH.

ph@miro.Berkeley.EDU (Paul Heckbert) (12/12/88)

Dale Schumacher (dal@midgard.Midgard.MN.ORG) wrote:
> ...I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those
> color values to their 8-bit equivalents...  The method I use is to start at
> the MSB of the source and destination values, copy bits from the source
> proceeding toward the LSB, if you reach the end of the source before filling
> the destination, start over at the beginning of the source.
> This works for both imcreasing and decreasing significance (equivalent
> to right-shift for decreasing).  Example: 101 --> 10110110,
> 000->00000000, 111->11111111, etc.  It seems to work for all cases,
> even wierd things like 7-bits -> 13-bits.

Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b,
and c are 0 or 1, into the 8 bit number abcabcab.

This is very close to the "correct" formula, but you've found a somewhat
roundabout way to compute it.  The formula you want will map black (000)
to black (00000000) and white (111) to white (11111111) and map everything
inbetween linearly.  In other words, you want to multiply by 255/7.
Your formula actually multiplies by 255.9375/7.

You can prove this to yourself by thinking of the 3-bit bit string x=abc as
a representation for the binary fraction x'=.abc (e.g. bit string 010
represents the number .010) and 8-bit bit string y=abcabcab is a code for
binary y'=.abcabcab .  But replicating the bits is equivalent to a
multiplication: y'=x'*1.001001001.  Putting our formulas together, we
have x'=x/8, y'=y/256, and 1.001001001=4095/3584,
so y/x = (1/8)*(4095/(512*7))*256 = 4095/(7*16) = 255.9375/7 .

It's good to step back from the low-level bits once in a while and think
about what these pixel values mean in the real world.

----

Dale also asked about algorithms for selecting the 16 colors out of a
palette of 512 that best represent an image.  This is called "color image
quantization".  I wrote about it in a paper:

    Paul S. Heckbert,
    "Color Image Quantization for Frame Buffer Display",
    Computer Graphics (SIGGRAPH '82 Proceedings),
    vol. 16, no. 3, July 1982, pp. 297-307

see also the improved algorithm in:

    S. J. Wan, K. M. Wong, P. Prusinkiewicz,
    "An Algorithm for Multidimensional Data Clustering",
    ACM Trans. on Mathematical Software,
    vol. 14, no. 2, June 1988, 153-162

Paul Heckbert, CS grad student
508-7 Evans Hall, UC Berkeley		UUCP: ucbvax!miro.berkeley.edu!ph
Berkeley, CA 94720			ARPA: ph@miro.berkeley.edu

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/14/88)

In article <8241@pasteur.Berkeley.EDU> ph@miro.Berkeley.EDU (Paul Heckbert) writes:
|
|Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b,
|and c are 0 or 1, into the 8 bit number abcabcab.
|
|This is very close to the "correct" formula, but you've found a somewhat
|roundabout way to compute it.  The formula you want will map black (000)
|to black (00000000) and white (111) to white (11111111) and map everything
|inbetween linearly.  In other words, you want to multiply by 255/7.
|Your formula actually multiplies by 255.9375/7.

The point of my "round-about" method is performance.  It's much easier to
replicate bits and do shifts than to divide by 7.  I believe that this
approximation will yield the "correct" value (to 8-bit int precision)
for all cases, right?

|Dale also asked about algorithms for selecting the 16 colors out of a
|palette of 512 that best represent an image.  This is called "color image
|quantization".  I wrote about it in a paper:

Thank you for the references.  I'll check into them.

dave@onfcanim.UUCP (Dave Martindale) (12/17/88)

In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes:
>
>The point of my "round-about" method is performance.  It's much easier to
>replicate bits and do shifts than to divide by 7.

But it's faster to *implement* almost any function using table lookup.
Even the naive and inaccurate shift-n-bits is faster when performed using
table lookup than the hardware shift instruction on some hardware.

Once you are using table lookup to do the pixel-by-pixel "computations",
it really doesn't matter how expensive the code that initializes the table
is - you only do it once.  So you might as well use multiply and divide,
and do the calculations in a way that someone else can read, and can see
by inspection is correct.

You can even use non-linear pixel encodings to avoid losing shadow detail
when the output pixel is narrow.  For example, my standard way of storing
12-bit linear data from a scanner into 8 bits is:

	outpix = 255 * ((inpix/4095.0) ** (1/2.2))

using floating point where needed, and rounding the result to an integer.
(The magic number 2.2 happens to be the standard value of "gamma correction"
that the NTSC television standard uses, so this can be sent to a frame
buffer and turned into NTSC without further gamma correction, but the
technique is worthwhile on its own even if the image will never appear
on video.)

raveling@vaxb.isi.edu (Paul Raveling) (12/20/88)

In article <16929@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes:
>>
>>The point of my "round-about" method is performance.  It's much easier to
>>replicate bits and do shifts than to divide by 7.
>
>But it's faster to *implement* almost any function using table lookup.

	This is true for relatively complex functions, but not usually
	for those that break down easily to simple operations such as
	shifts and adds.  I've measured speed improvements up to a
	factor of 14 over ordinary C code in the most extreme case
	by moving a critical algorithm to assembly language and using
	this sort of shifty logic.
	
	Both techniques are valuable.  For example, real time software
	such as that used in Central Air Data Computers uses shift/subtract
	logic wherever possible for functions such as the simplest digital
	filters [something like filtered_value = (7*old_value+new_value)/8];
	it uses table lookup with linear interpolation between entries for
	other functions.  The other functions need not be very complex
	to make the table lookup useful -- sqrt, for example.

	Where table lookup REALLY shines is evaluating relatively
	complex functions.  Software such as the B-1B's CADC
	uses it profusely to keep adequate real time margins.

>Once you are using table lookup to do the pixel-by-pixel "computations",
>it really doesn't matter how expensive the code that initializes the table
>is - you only do it once.  So you might as well use multiply and divide,
>and do the calculations in a way that someone else can read, and can see
>by inspection is correct.

	Good commenting serves the latter purpose.  It's just as easy
	to supply an actual equation as a comment as it is to use
	it as code.  It's a matter of software engineering discipline
	to be sure the comments match the code -- we all do that,
	religiously, don't we?


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

dave@onfcanim.UUCP (Dave Martindale) (12/21/88)

In article <7086@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>>But it's faster to *implement* almost any function using table lookup.
>
>	This is true for relatively complex functions, but not usually
>	for those that break down easily to simple operations such as
>	shifts and adds.  I've measured speed improvements up to a
>	factor of 14 over ordinary C code in the most extreme case
>	by moving a critical algorithm to assembly language and using
>	this sort of shifty logic.

Is that a factor of 14 speed improvement over table lookup?  Or a
factor of 14 improvement over code that contained multiply or divide in
the inner loop?  A comparison against table lookup is what matters.
What processor was this on?

For shift/add to be faster than table lookup, the time required to do
the shifts and adds must be less than that required to do the table
addressing calculations and the extra memory fetch.  (Note that the
table word will usually be in the cache on machines that have a cache.)

On machines like the VAX, where the table addressing calculations can
be done as part of a "move" instruction but the shifts and adds are
done as separate instructions (and shift is very slow on some models),
table lookup is going to be faster.  On another machine where the table
addressing requires several instructions but a barrel shifter is
available, the shift method will likely be faster.  You have to try
both, or have a good knowledge of the particular model of the
particular architecture of processor you are using, to determine which
is faster.

However, table lookup has some additional benefits.  It is simple
enough that carefully-written C code is likely to generate the best
possible assembler code, so there is no need to use assembler.  A
single copy of the lookup code functions for all possible input and
output widths, as long as the pixels are always stored in words of a
constant width.

In contrast, the shift/add code requires different sequences of
instructions for different input and output pixel bit widths, since the
output may require adding 1 or 2 or 3 or more copies of the input,
shifted by various amounts.  Either you must pre-compile all possible
variations that you could ever need, or compile code during execution,
or just have a general-purpose algorithm that needs tests and branches
within the pixel lookup loop (bye-bye performance).  How do you deal
with this?

ksbooth@watcgl.waterloo.edu (Kelly Booth) (12/21/88)

In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>For shift/add to be faster than table lookup...

Actually, some solutions combine BOTH techniques.  If the table has two
or more indices (not the case here), then the lookup requires combining
those indices into a single index into the table.  Some compilers do
this with adds and multiplies.  If machine code is written (or if the
compiler is smart) shifts and adds can be substituted.  This may
require that the table size(s) be adjusted to the next larger powers of
two.

Or, recursively, the composite indices can themselves be computed using
a table lookup scheme.

raveling@vaxb.isi.edu (Paul Raveling) (12/22/88)

	BTW, in case it wasn't clear my preceding response was
	suggesting that the "shifty logic" and table lookup with
	interpolation alternatives are more appropriate when the
	function being implemented has a large domain.  Direct
	table lookup is best when the domain is relatively small,
	and I agree that it's the best technique for sheer speed.

	A case where direct table lookup isn't practical is any of
	a few image processing utilities we have.  They examine
	a 5x5 square of pixels with 24 bits of color per pixel
	to determine a new color for the center pixel.  I'd love to
	use direct table lookup for this, but 25*2**24 bytes is a
	tough table to handle.

In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>
>Is that a factor of 14 speed improvement over table lookup?  Or a
>factor of 14 improvement over code that contained multiply or divide in
>the inner loop?  A comparison against table lookup is what matters.
>What processor was this on?

	This was several years ago; I believe the algorithm was
	an integer square root function, the processor definitely
	was either an 8088 or an 80286 on an IBM PC.  Since the
	function's domain was unsigned 16-bit integers and the PC's
	memory was quite limited, direct table lookup would have
	been impractical, even though it certainly would be much
	faster.

	I must admit that the factor of 14 is exceptional because of
	the 80x86 processor architecture;  going into assembly language
	allowed carefully recoding some branching logic that produced
	excessive queue flushes, which are a major time waster in this
	family of processors.  Of course a direct table lookup would
	eliminate this problem.


	Note also that there are two different types of table lookup
	to consider:  Direct table lookup and searching.  The variant
	I mentioned in connection with real time avionics software
	used tables containing both x and y values (as in y = f(x) --
	not screen coordinates) to handle continuous functions of
	real numbers.  It did a binary search to find the nearest
	x values to the argument, then used linear interpolation
	to derive the corresponding y value.  In order to support
	the required accuracy most tables contained around 6-10 entries;
	the shortest I can recall was 2 entries, longest about 50.
	The version of the B-1B CADC that I worked on used sets of
	these tables to implement up to 5-dimensional linear interpolation.
	
	This approach is VERY fast for a large class of functions;
	with a mean search requiring something like 3 lookups and
	only simple math, it's faster than computing for lots of things
	with large domains or complex [not simple] equations.  It's
	also good for supporting empirically derived functions, such as
	error correction for the static pressure source on military aircraft;
	this tends to be a bit bizarre in the transonic regime near mach 1.

	BTW, a necessary aid for building that sort of table is a
	utility to do curve fits, then extract a set of (x,y) points
	which keep error less than a given accuracy requirement when
	using linear interpolation between these points.

	Bottom line:  By all means use direct table lookup where it's
	feasible; but don't forget there are a couple other approaches
	that can still save time.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

turk@Apple.COM (Ken "Turk" Turkowski) (12/31/88)

A fast but crude method is:

Y = (R + 2G + B) / 4
--
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: Turkowski1

jbm@eos.UUCP (Jeffrey Mulligan) (01/04/89)

From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
> A fast but crude method is:
> 
> Y = (R + 2G + B) / 4
> --
> Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
> Internet: turk@apple.com
> Applelink: Turkowski1

Equally fast and crude but probably more accurate:

Y = 2R + 5G + B

-- 

	Jeff Mulligan (jbm@aurora.arc.nasa.gov)
	NASA/Ames Research Ctr., Mail Stop 239-3, Moffet Field CA, 94035
	(415) 694-6290

falk@sun.uucp (Ed Falk) (01/04/89)

In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
> > Y = (R + 2G + B) / 4
> Y = 2R + 5G + B

Yeesh, you people.  Why do you need to simplify it?  Are you going to be
doing these calculations by hand?  Get a calculator or something.

If you *must* do it in integer for speed reasons, do it this way:

    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/

The results are correct to four decimal places and the divide is replaced
by a right-shift in a decent compiler and a byte-move in a good compiler.

-- 
		-ed falk, sun microsystems
		 sun!falk, falk@sun.com
		 card-carrying ACLU member.

raveling@vaxb.isi.edu (Paul Raveling) (01/05/89)

In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes:
>In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
>> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
>> > Y = (R + 2G + B) / 4
>> Y = 2R + 5G + B

	[The latter should really be Y = (2R + 5G + B) / 8, right?]
>
>Yeesh, you people.  Why do you need to simplify it?  Are you going to be
>doing these calculations by hand?  Get a calculator or something.
>
>If you *must* do it in integer for speed reasons, do it this way:
>
>    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/
>
>The results are correct to four decimal places and the divide is replaced
>by a right-shift in a decent compiler and a byte-move in a good compiler.

	... because sometimes speed is lots more important than 4
	significant digits of accuracy, and multiplies are slow.

	Consider a 68020 running code for these operations:

	a)	Y = (R + G+G + B) >> 2
	b)	Y = (R+R + G<<2+G + B) >> 3
	c)	Y = (77*r + 151*g + 28*b) >> 8

	For a rough cut at comparing timings, adding up the number
	of clocks for each instruction for each of the 3 cases given
	in the gospel according to Motorola, assuming the work's done
	in registers and the result is stored with a (An)+  reference,
	gives:

			Best Case	Cache Case	Worst Case
			---------	----------	----------
	
	a)		   5		   14		    18
	b)		   6		   20		    25
	c)		  83		   93		    99


	Which makes the accurate variant a whale of a lot slower
	than the others.  This sort of thing gets fairly noticable
	if you're massaging a megapixel image.

	BTW, this is an example of a function that couldn't easily
	be accelerated by table lookup unless R, G, and B have very
	few bits.  Even then 3D subscript computation puts the table
	lookup in the same speed range as the faster 2 of these
	alternatives.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

ksbooth@watcgl.waterloo.edu (Kelly Booth) (01/05/89)

In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>	... because sometimes speed is lots more important than 4
>	significant digits of accuracy, and multiplies are slow.

. . . (stuff deleted) . . .

>	BTW, this is an example of a function that couldn't easily
>	be accelerated by table lookup unless R, G, and B have very
>	few bits.  Even then 3D subscript computation puts the table
>	lookup in the same speed range as the faster 2 of these
>	alternatives.

Huh?  Table look up can be used to replace each of the three multiplies
in aR+bG+cB so that the code becomes something like

	a[R]+b[G]+c[B]

if R-G-B are bytes (the usual case and what most of the previous postings
have assumed -- for up to 12 bits table look up is still reasonable).  This
leaves just the adds and the divide (not shown at the end), which for the
posting that suggest this was still a factor of two (in fact 256) so the
byte swap/move or shift tricks all still work.  There is no need to tabulate
the entire function.  [See previous postings on table look up in this
news group about 1-2 weeks ago.]

raveling@vaxb.isi.edu (Paul Raveling) (01/07/89)

In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>	Consider a 68020 running code for these operations:
>
>	a)	Y = (R + G+G + B) >> 2
>	b)	Y = (R+R + G<<2+G + B) >> 3
>	c)	Y = (77*r + 151*g + 28*b) >> 8
>
>	BTW, this is an example of a function that couldn't easily
>	be accelerated by table lookup unless R, G, and B have very
>	few bits.  Even then 3D subscript computation puts the table
>	lookup in the same speed range as the faster 2 of these
>	alternatives.


	It's time to eat some of my own words...  I admit to taking
	my brain out of gear too soon on this one.
	
	As Bob Webber pointed out in an email message, a good candidate
	for the best approach of all is likely to be:

	d)	Y = (times77[R] + times151[G] + times28[B]) >> 8


	If R, G, and B are 8 bits this only requires 768 bytes of
	table space and it should be about as fast as alternative b.
	This is easily worth it for having both good speed and good
	accuracy.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (01/10/89)

In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes:
|In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
|> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
|> > Y = (R + 2G + B) / 4
|> Y = 2R + 5G + B
|    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/
|
|The results are correct to four decimal places and the divide is replaced
|by a right-shift in a decent compiler and a byte-move in a good compiler.

I don't know where you got your numbers.  The values I have for the Y
component of YIQ (luminance) from RGB are: R=.299 G=.587 B=.144
The following formula is the best approximation with 8-bit values:
  Y = (R*77 + G*150 + B*29) / 256
Which gives the weights: R=.3008 G=.5859 B=.1133,   total error=.0036
Your values give the weights: R=.3008 G=.5898 B=.1094,   total error=.0092
Even MY numbers don't have 4 places of accuracy, but they are a better
approximation to the 3 place target values I have.  Someone mentioned that
the NTSC weight may have been changed recently, is that so?

PS.  I fully agree with the idea that more accurate values should be used
if you're going to use integer, and do 3 multiplies and a 'divide' (which
can be optimized if it's a power of 2) anyway.

raveling@vaxb.isi.edu (Paul Raveling) (01/14/89)

In article <10322@well.UUCP> Jef Poskanzer <jef@rtsg.ee.lbl.gov> writes:
>I wrote a quick test program to try out various approximations.  It runs
>five million conversions.  On a Sun 3/260, the timings are:
>
>    float:  223.0
>    int:     35.4
>    table:   31.6
>
>I have appended the program, in case anyone wants to run it on a different
>architecture or try different approximations.

	Just below are some results from an HP 9000/350.  I added
	two runs: One was with "shifty" logic defined by...

#ifdef SHIFTY
	j = ( r+r + (g<<2)+g + b ) >> 3;
#endif

	The other was a "no logic" run, with nothing defined to
	get an overhead calibration (how much time the loop logic and
	rgb updating used).  The "Less Overhead" column below subtracts
	this to get a direct comparison of timing for the math only.

	Test		Raw Timing	Less Overhead
	----		----------	-------------

	float		  220.0		    201.2
	int		   37.6		     18.8
	table		   34.9		     16.1
	shifty		   28.9		     10.1
	overhead	   18.8		      0


	This isn't entirely what I anticipated:  The "int" version,
	(j = ( r * 77 + g * 150 + b * 29 ) >> 8;), appeared to be
	faster than expected.  I checked further and found that
	on this one the compiler decomposed all three multiplies
	into shifts, adds, and a subtract.

	Also, the table version seemed too slow.  It turned out
	that the compiler generated some remarkably crummy code.
	ALL data except the tables were kept on the stack -- none
	in registers -- and the subscript address computations
	appeared to be distinctly suboptimal.

	Next, maybe tomorrow, I'll try the same stuff with some
	hand coded assembly language.  It should be easy to beat
	the compiler by LOTS.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

falk@sun.uucp (Ed Falk) (01/15/89)

> >	BTW, this is an example of a function that couldn't easily
> >	be accelerated by table lookup unless R, G, and B have very
> >	few bits.  Even then 3D subscript computation puts the table
> >	lookup in the same speed range as the faster 2 of these
> >	alternatives.
> 
> Huh?  Table look up can be used to replace each of the three multiplies
> in aR+bG+cB so that the code becomes something like
> 
> 	a[R]+b[G]+c[B]
> 

I'm embarrassed.  I had been doing (r*77 + g*150 + 29*b)/256 all along
thinking that all multiplies took the same amount of time (the compiler,
it turns out, optimizes constant multiplies in interesting ways).

I've switched all my code to use look-up tables now.  I've gained a new
respect for look-up tables.


-- 
		-ed falk, sun microsystems
		 sun!falk, falk@sun.com
		 card-carrying ACLU member.

raveling@vaxb.isi.edu (Paul Raveling) (01/17/89)

In article <7266@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>	Next, maybe tomorrow, I'll try the same stuff with some
>	hand coded assembly language.  It should be easy to beat
>	the compiler by LOTS.
>

	Here's the result of using hand coded assembly language
	for the "table" algorithm:

	Test		Raw Timing	Less Overhead
	----		----------	-------------

	C version	   34.9		     16.1
	Assembly version   10.2		      6.4


	Anyone have a better C compiler?

	If anyone would like to check this hacked assembly version
	on other systems, let me know.  I can either email it or
	post it, but should take a few minutes to clean the source
	up a little & stick in warnings that very few 68K assemblers
	use identically the same source syntax.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

jonathan@jvc.UUCP (Jonathan Hue) (01/18/89)

I'm slightly puzzled by these calculations of luminance from RGB.  Doesn't
the formula Y = .299R + .587G + .114B only apply when RGB represents
intensity, rather than pixel values?  If your pixel values are completely
gamma-corrected through look-up tables in your frame buffer hardware so
they represent intensities, this would work, but if you use linear look-up
tables (or don't have any), you would need to convert pixel values to
intensity, calculate luminance, then convert them back into pixel values
(voltages).

Also, considering how far the green of the typical color monitor is from
NTSC green, it may be worth deriving new coefficients for the monitor you
are using.


Jonathan Hue		uunet!jvc!jonathan