[comp.graphics] Luminance from RGB

poynton%vector@Sun.COM (Charles Poynton) (11/06/88)

In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R.
Swick <swick@ATHENA.MIT.EDU> comments:

> When converting RGB values to monochrome, the sample server(s) compute
> an intensity value as (.39R + .5G + .11B) ...

(.39R + .5G + .11B) is apparently incorrect.  This set could be a
typographical error (from .29R + .6G + .11B ?), a liberal approximation,
or perhaps an unusual phosphor set.  Could someone enlighten me on this?

In followup article <8811042303.AA21505@dawn.steinmetz.GE.COM>, Dick
St.Peters <stpeters@dawn.UUCP> makes the statement:

> I'd like to suggest that (.39R + .5G + .11B) is not a good choice for
> "intensity" in the realm of computer graphics.  ...
>
> A better choice in computer graphics is to equally weight the colors:
> ((R+G+B)/3.0).  Let white be white.

Equal weighting of the primaries is NOT the right thing to do, unless the
viewers of your images are members of some species that has uniform
response across the visible spectrum, unlike homo sapiens.

Humans see 1 watt of green light energy as being somewhat brighter than 1
watt of red, and very much brighter than 1 watt of blue.  The science of
colourimetry began to flourish in 1931, when the CIE standardized a
statistical entity called the "standard observer".  This includes a
standard spectral luminance response defined numerically as a function of
wavelength.  It is from this data that the factors which are used in
colour television are derived:  .587 for green, .299 for red, and 
.114 for blue.

The particular factors depend on the wavelengths or chromaticities
that you call red, green, and blue:  there is wide disparity in these
choices.  For computer graphics and television, the luminance factors
depend on the chromaticity coordinates of the phosphors of your CRT.
There are compromises in the choice of phosphor primaries, but it turns
out that the NTSC did a spectacularly good job of selecting primaries.
The luminance coefficients 0.299 for red, 0.114 for blue, and 0.587 for
green are unquestionably the best values to use, unless you know your
phoshphors intimately.

The second article continues, 
> 						The formula is from
> the (1954) NTSC standard for compatible color TV, and it has built
> into it a lot of compromises to accommodate old technology and
> problems inherent in the analog transmission of composite color
> television.

Contrary to this assertion, the ONLY compromise in NTSC which impacts the
luminance equation is the choice of reference phosphor chromaticities, and
a choice of phosphors MUST be made for any system which transmits colour
in RGB.  Just because it's old (1954) doesn't mean we should throw it
away.


Charles Poynton						"No quote.
poynton@sun.com						 No disclaimer."
(415)336-7846

poynton%vector@Sun.COM (Charles Poynton) (11/09/88)

This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU>
of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for
luminance.

Summary

Luminance weighting coefficients are a function ONLY of the human
luminance sensitivity functions, the selected phosphor chromaticities, and
the chromaticity of the reference illuminant (white point).  The equations
can be found on page 2.28 and on in K. Blair Benson's "Television
Engineering Handbook", or on page 20-10 of Donald G. Fink's Electronic
Engineers' Handbook, Second Edition".

A Simple Thought Experiment

Think of three monochromatic primaries, for example, tuneable dye lasers.
If (R+G+B)/3 is white, what happens when I move the wavelength of the blue
primary?  If I move it towards the ultraviolet, the eye responds less to
it and luminance drops.  If I move it towards green, the eye responds more
(in which channel(s) we don't care) and its luminance increases.

Hence, the equation for luminance must depend in some manner on what
wavelength you call "blue".

White

CIE Illuminant D65 refers to a standard spectral distribution which is
chosen to approximate daylight.  This is television's standard white
reference.  So-called "Zero chroma" in NTSC occurs on this colour.  People
may or may not respond to this as "white":  if the observer is adapted to
a pink room, he'll think it's yellowish-green.  The reason that the CIE
decided to standardize white, in 1931, is to allow work in colourimetry to
be done objectively.

To achieve accurate colour reproduction requires specification of
reference white.  This is a lurking problem for accurate colour
reproduction in computer graphics, because most workstation monitors are
adjusted for a white point (of about 9300 K) that's quite a bit more blue
than the television standard.  Software people tend to say
"R=G=B=1=white.  Simple."  But it's not that simple.

Blue Shirts

Sorry to air the laundry in public.  Fabric whiteners work by adding
materials which flouresce to convert ultraviolet light [to which the human
eye is not sensitive] into visible light in the blue part of the spectrum
[to which it is].  The shirt not only looks brighter than white, it IS
brighter than white.  But this has nothing to do with the luminance
coefficients.

With modern phosphors the blue luminances coefficient is actually tending
to go DOWN.  To produce accurate reproduction with European standard
phosphors requires a blue luminance contribution of 0.071.  This is a big
colour problem for Europe, because they use the same luminance function as
we do, and it's not exactly matched to their phosphors.  They tweak their
camera spectral sensitivities as a first-order fix to improve the colour
reproduction accuracy.  

Orthogonality

Luminance is luminance; chrominance is chrominance; and NTSC can be
thought of as conveying Y, U, and V independently.  This is true
regardless of your interpretation of what these quantities represent.
They're "orthogonal" provided that one can't be derived from the other
two.  Although there are some subtle signal-to-noise ratio considerations
in television coding, this issue is independent of (or should I say
orthogonal to) the choice of luminance coefficients.

NTSC RF Modulation

I stand corrected by Mr St Peters on an RF modulation point:  there IS one
compromise made in NTSC transmission via RF.  White modulates the RF
carrier to 12.5%, and black modulates it to 70.3125%.  When NTSC modulates
an RF carrier, chroma excursions near highly saturated yellow and near
highly saturated cyan are clipped to 120 IEEE units prior to the
modulator, to avoid UNDER-modulating the transmitter.  Two small regions
of colour space are lost in this case.  No practical television camera has
sufficient colour separation capability to generate signals in these
regions, but electronically-synthesized colour bars have perfect colour
saturation and would undermodulate the transmitter if left alone.  Studio
colour bars are generated at 75% saturation to avoid these two regions of
colour space.  This issue does not bear on the choice of luminance
coefficients, and is relevant to broadcast only.  VHS machines and NTSC
monitors reproduce all of the NTSC colour space even at 100% saturation.

I would have put this first, but then you wouldn't have read all the
colour stuff, would you?

Charles

andru@rhialto.SGI.COM (Andrew Myers) (11/09/88)

In article <76353@sun.uucp>, poynton%vector@Sun.COM (Charles Poynton) writes:
> In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R.
> Swick <swick@ATHENA.MIT.EDU> comments:
> 
> > I'd like to suggest that (.39R + .5G + .11B) is not a good choice for
> > "intensity" in the realm of computer graphics.  ...
> >
> > A better choice in computer graphics is to equally weight the colors:
> > ((R+G+B)/3.0).  Let white be white.
> 
> Equal weighting of the primaries is NOT the right thing to do, unless the
> viewers of your images are members of some species that has uniform
> response across the visible spectrum, unlike homo sapiens.
> 

From Foley and Van Dam, page 613, we have the relationship between the
YIQ and RGB color systems:

    Y   0.30  0.59  0.11   R                                            
    I = 0.60 -0.28 -0.32 . G                                             
    Q   0.21 -0.52  0.31   B                                         

From this, we can see that the luminous reponse of the eye (Y) is exactly
that described by Mr. Poynton, albeit with less precision. What the other
poster was being confused by was the *inverse* of this process. That is,
when I,Q=0 (white light), we have R=G=B=Y. This can easily be seen from
the inverse matrix: (Again, with 2 digits of precision)

    R   1.00  0.95  0.62   Y
    G = 1.00 -0.28 -0.64 . I
    B   1.00 -1.11  1.73   Q

The fact that  Y = 0.30R + 0.59G + 0.31B  has little to do with white
being made up of equal components of red, green, and blue. Or at least
the connection is through a matrix inverse, unintuitive at best. Hope
this clarifies more than it confuses.

Andrew

awpaeth@watcgl.waterloo.edu (Alan Wm Paeth) (11/10/88)

In article <76649@sun.uucp> poynton%vector@Sun.COM (Charles Poynton) writes:
>This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU>
>of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for
>luminance.

Thank you, Charles for a very informative article. I (for one) I'm getting
tired of hearing misinformation such the above comment or related statements
such as "[CMY] = 1-[RGB]". It's nice to see the record being set straight.
(This is not a criticism on the original poster).

These color "facts" have a feeling of common sense (as 0th order approximations
to reality) which explains why they are deduced from first principles and then
readily requoted. A bit like teaching Newtonian physics without being reminded
that Relativity exists. Fortunately, the latest Computer Graphics texts are
beginning to give color the more precise treatment it deserves.

Some additional fine points which might be of interest to people:

>White
>
>To achieve accurate colour reproduction requires specification of
>reference white.  This is a lurking problem for accurate colour
>reproduction in computer graphics, because most workstation monitors are
>adjusted for a white point (of about 9300 K) that's quite a bit more blue
>than the television standard.  Software people tend to say
>"R=G=B=1=white.  Simple."  But it's not that simple.

When people say "RGB color" my first impressions are (1) color used in the
context of a digital computer and (2) (all to often) an associated sense
vagueness about the exactness of the specification ("well, it's 24 bit RGB?!").

For instance, R might relate to the phosphor chromaticity of a specific
monitor, a spectral line, or the NTSC defined value (I've yet to see an
NTSC monitor that reproduces this). In our lab I've encountered many "R"'s:

    Chromaticity Coords		Comments
    -------------------------------------------------
    x' = .62, y' = .21		Electrohome monitor red phosphor
    x' = .65, y' = .30		Aydin monitor red phosphor 
    x' = .63, y' = .30		Spectral line at 600 Angstroms (CIE tables)
    x' = .67, y' = .21		NTSC defined "Red"

The use of the NTSC standard (with which the CIE tables for Y and a matrix
inversion give the familiar Y = .299B + .587G + .114B) is far better than using
Y = 1/3(R+G+B), but even then, I've yet to see an NTSC monitor. The first
color TV's tried to live up to the standard, but the required red spectral
purity is so high (that's the x' = .67 in the table) that luminance suffers.
So the industry pushes brighter, less pure red phosphors (remember the
"rare earth phosphor" ads of the early 70's?) and it's anyone's guess where
the R coordinates are. Unless you have a lot of money, studio monitors tend to
migrate in the direction of their commercial counterparts (lower spectral
purity in the phosphors).

>
>Orthogonality
>
>Luminance is luminance; chrominance is chrominance; and NTSC can be
>thought of as conveying Y, U, and V independently.  This is true
>regardless of your interpretation of what these quantities represent.
>They're "orthogonal" provided that one can't be derived from the other
>two.  Although there are some subtle signal-to-noise ratio considerations
>in television coding, this issue is independent of (or should I say
>orthogonal to) the choice of luminance coefficients.

Well, that's really a definition for independence, not complete orthogonality.
The YIQ television signal is similar to the CIE defined YUV (or XYZ) color
spaces in that the Y's (luminance) are the same. The I and Q chromanence
signals pick up the remaining two degrees of freedom. The matrix that defines
the coordinate change was chosen out of bandwidth considerations. In fact,
I and Q stand for "In phase" and "Quadrature" signal. They are encoded for
broadcast on color subcarriers that are 90 degrees out of phase. It is these
signals that are orthogonal (in the sin(x), cos(x) sense), but not the
independent values which they encode.

>NTSC RF Modulation
>
>I stand corrected by Mr St Peters on an RF modulation point:  there IS one
>compromise made in NTSC transmission via RF...When NTSC modulates
>an RF carrier, chroma excursions near highly saturated yellow and near
>highly saturated cyan are clipped to 120 IEEE units prior to the
>modulator, to avoid UNDER-modulating the transmitter.  Two small regions
>of colour space are lost in this case.  No practical television camera has
>sufficient colour separation capability to generate signals in these
>regions, but electronically-synthesized colour bars have perfect colour
>saturation and would undermodulate the transmitter if left alone.

Almost. There are stories of independent stations with substandard power
supplies that got fratzed when trying to run _Sesame Street_ clips --
Big Bird is both big and yellow (the AM video signal draws power as a function
of the modulation). As for the generation of "synthetic" colors as with color
bar tests, one has to be careful. I worked on Shoup's "Superpaint" system when
with Xerox (one of the first color "paint" systems. It provided NTSC output).
It featured a menu item to test for "hot" broadcast colors such as yellow to
avoid modulation problems -- it would blink the colors. In that case one had
either to reduce the intensity or desaturate.

A tool like this is useful and necessary for readying computer graphics images
for commercial broadcast. Those images *invariably* have highly saturated
colors (a generation raised on Big Bird?). If there is enough interest I can
post a production tuned C program which flags "hot" colors.

    /Alan Paeth
    Computer Graphics Laboratory
    University of Waterloo

mab@pixar.UUCP (Malcolm Blanchard) (11/12/88)

The discussion of luminance computations and the subsequent discussion
of the meaning of white reminds me of an experience I had a few years ago
when Pixar was a division of Lucasfilm and we were working on an effect
for "Young Sherlock Holmes".  Aesthetic decisions were being made by
people sitting in front of color monitors.  The digital images were
transferred to film using three color lasers.  The film was printed and
then projected in a screening room.  I decided that this was an great
place to implement a what-you-see-is-what-you-get color system.  And
so I delved into the murky depths of colorimetry in the hope of
developing a color-correction program that would produce
the same color in the screening room that was measured on the color
monitors.  This a difficult problem (in fact, in its strictest sense, an
impossible one, since the color gamuts of the two systems have mutually
exclusive regions).  I took into account the CIE coordinates of the
monitor's phosphors, its color balance, the sensitivity of the color film
to each of the lasers, cross-talk between the film layers, effects of
film processing, the spectral characteristics of the print film's dye
layers, and the spectral characteristics of a standard projector
bulb.  Several steps in this process are extremely non-linear, but I
was able to achieve some good results by using some piece-wise linear
approximations.  I felt a great sense of success when I used a colormeter
to confirm that the CIE coordinates on the silver screen did, indeed,
closely match those on the tiny screen.  We color corrected a few shots
and showed them to the effects director.  He's response was, "Why does
this look so blue"?  It turns out that when we look at a TV we're accustomed
to a blue balance and when we're sitting in a theater we expect a yellow
balance.  The digital color correction was abandoned and the production
relied on the film lab to produce an aesthetic balance.  Thus proving
to me that science may work, but computer graphics and film making are
still largely a matter of art.

karlton@decwrl.dec.com (Philip Karlton) (12/06/88)

I found the entire discussion on luminance last month quite interesting. What
I would like now from the experts is what they would do to convert an RGB
value as specified in the X protocol into a gray level.

The particular problem I have in mind is that of a StaticGray display with N
equally spaced intensities arranged in a ramp with black at 0 and white at
N-1. I have access to hardware with N of 2, 4, 16, and 256.

Two different expressions for computing the appropriate pixel value come
immediately to (my) mind:

For r, g, b in [0..1]

	floor((.299r + .587g + .114b)(n - 1) + 0.5)		(a)

or for r, g, b in [0..1)

	floor((.299r + .587g + .114b)(n))			(b)

(a) and (b) produce almost identical results for N=2. For N=256, the resulting
differences are probably not detectable by the human eye, certainly not mine.
For N=4, the differences are observable.

The correct thing is for the client to have done the appropriate dithering and
present the pixmap to the server. For those clients that ignore the visual
type of the root window, the server has to do the mapping of RGB (in X's
terms) to some pixel value. Is either (a) or (b) the appropriate choice. Is
some better function around that I should use?

For the numerically curious: r, g, and b above could be computed using

	r = ((float) screenRed) / maxColor;
	b = ((float) screenBlue) / maxColor;
	g = ((float) screenGreen) / maxColor;

where maxColor is dependent upon which of (a) or (b) is chosen:

	float maxColor = (float) (0xFFFF);		/* (a) */
or
	float maxColor = (float) (0x10000);		/* (b) */

PK

srneely@watcgl.waterloo.edu (Shawn Neely) (12/07/88)

In article <964@bacchus.dec.com> karlton@decwrl.dec.com (Philip Karlton) writes:

+I would like now from the experts is what they would do to convert an RGB
+value as specified in the X protocol into a gray level.
+
+The particular problem I have in mind is that of a StaticGray display with N
+equally spaced intensities arranged in a ramp with black at 0 and white at
+N-1. I have access to hardware with N of 2, 4, 16, and 256.
+
+Two different expressions for computing the appropriate pixel value come
+immediately to (my) mind:
+
+For r, g, b in [0..1]
+
+	floor((.299r + .587g + .114b)(n - 1) + 0.5)		(a)
+
+or for r, g, b in [0..1)
+
+	floor((.299r + .587g + .114b)(n))			(b)
+
+ (stuff deleted)
+...Is either (a) or (b) the appropriate choice. Is
+some better function around that I should use?
+
+For the numerically curious: r, g, and b above could be computed using
+
+	r = ((float) screenRed) / maxColor;
+	b = ((float) screenBlue) / maxColor;
+	g = ((float) screenGreen) / maxColor;
+
+where maxColor is dependent upon which of (a) or (b) is chosen:
+
+	float maxColor = (float) (0xFFFF);		/* (a) */
+or
+	float maxColor = (float) (0x10000);		/* (b) */
+
+PK

The correct approach is (a). A strong argument for using
the closed interval [0..1] (and [0..N-1]) is given in
    "Design and Experience with a Generalized Raster Toolkit"
    by Paeth and Booth in Proc. Graphics Interface '86 (Vancouver).

The interval is consistent with the design of a number of colour spaces,
and is correct when data is taken to higher significance. The same is
not true of the wrong (often implicit using bit shifts) use of
the open interval [0..1).

For example, a one-bit image in the [0..1) model allows only
the intensity values 0.0 and 0.5, and not "full on".

The required multiplications and divisions for the correct approach
can often be performed by table lookup.
-- 
(.I.)   "The road of excess leads
 ).(       to the palace of wisdom."
( Y )              -William Blake

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/10/88)

In article <7162@watcgl.waterloo.edu> srneely@watcgl.waterloo.edu (Shawn Neely) writes:
[...discussion of RGB->Grey conversion and "shifting" to higher precision...]
|
|The interval is consistent with the design of a number of colour spaces,
|and is correct when data is taken to higher significance. The same is
|not true of the wrong (often implicit using bit shifts) use of
|the open interval [0..1).
|
|For example, a one-bit image in the [0..1) model allows only
|the intensity values 0.0 and 0.5, and not "full on".
|
|The required multiplications and divisions for the correct approach
|can often be performed by table lookup.

Over the last couple of weeks I've been working with the PBM code posted to
(i think) comp.source.misc, expanding it to handle 8-bit greyscale and
24-bit color images.  In the process, I've run into some of the problems
being discussed here.  Following are some of my solutions.  I think they
work as well or better than what I've seen posted so far, but if I'm
missing some glaring deficiency, I'd like to have it pointed out to me.

RGB to Greyscale:
  I use the formula GREY = ((76 * R) + (150 * G) + (29 * B)) >> 8;
where R, G and B are 24-bit color components [0..255] and the result
is a greyscale value [0..255] with intermediate values in the formula
not exceeding 16 bits (ie. int).  This gives a good approximation of
the 29.9% red, 58.7% green, 11.4% blue luminence contributions.

Extrapolation to more significance:
  I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those
color values to their 8-bit equivalents.  As was mentioned, simply doing
a left-shift is not sufficient.  The method I use is to start at the MSB
of the source and destination values, copy bits from the source proceeding
toward the LSB, if you reach the end of the source before filling the
destination, start over at the beginning of the source.  This works for
both imcreasing and decreasing significance (equivalent to right-shift for
decreasing).  Example: 101 --> 10110110, 000->00000000, 111->11111111, etc.
It seems to work for all cases, even wierd things like 7-bits -> 13-bits.

One problem I have yet to solve it analyzing a picture to choose N colors
out of a (larger) palette of M colors that best represent a given image.
For example, I can only use 16-colors out of a palette of 512, so which
are the "best" 16 to use.  I already have color dithering algorithms, but
I need to decide which colors to dither WITH.

ph@miro.Berkeley.EDU (Paul Heckbert) (12/12/88)

Dale Schumacher (dal@midgard.Midgard.MN.ORG) wrote:
> ...I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those
> color values to their 8-bit equivalents...  The method I use is to start at
> the MSB of the source and destination values, copy bits from the source
> proceeding toward the LSB, if you reach the end of the source before filling
> the destination, start over at the beginning of the source.
> This works for both imcreasing and decreasing significance (equivalent
> to right-shift for decreasing).  Example: 101 --> 10110110,
> 000->00000000, 111->11111111, etc.  It seems to work for all cases,
> even wierd things like 7-bits -> 13-bits.

Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b,
and c are 0 or 1, into the 8 bit number abcabcab.

This is very close to the "correct" formula, but you've found a somewhat
roundabout way to compute it.  The formula you want will map black (000)
to black (00000000) and white (111) to white (11111111) and map everything
inbetween linearly.  In other words, you want to multiply by 255/7.
Your formula actually multiplies by 255.9375/7.

You can prove this to yourself by thinking of the 3-bit bit string x=abc as
a representation for the binary fraction x'=.abc (e.g. bit string 010
represents the number .010) and 8-bit bit string y=abcabcab is a code for
binary y'=.abcabcab .  But replicating the bits is equivalent to a
multiplication: y'=x'*1.001001001.  Putting our formulas together, we
have x'=x/8, y'=y/256, and 1.001001001=4095/3584,
so y/x = (1/8)*(4095/(512*7))*256 = 4095/(7*16) = 255.9375/7 .

It's good to step back from the low-level bits once in a while and think
about what these pixel values mean in the real world.

----

Dale also asked about algorithms for selecting the 16 colors out of a
palette of 512 that best represent an image.  This is called "color image
quantization".  I wrote about it in a paper:

    Paul S. Heckbert,
    "Color Image Quantization for Frame Buffer Display",
    Computer Graphics (SIGGRAPH '82 Proceedings),
    vol. 16, no. 3, July 1982, pp. 297-307

see also the improved algorithm in:

    S. J. Wan, K. M. Wong, P. Prusinkiewicz,
    "An Algorithm for Multidimensional Data Clustering",
    ACM Trans. on Mathematical Software,
    vol. 14, no. 2, June 1988, 153-162

Paul Heckbert, CS grad student
508-7 Evans Hall, UC Berkeley		UUCP: ucbvax!miro.berkeley.edu!ph
Berkeley, CA 94720			ARPA: ph@miro.berkeley.edu

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/14/88)

In article <8241@pasteur.Berkeley.EDU> ph@miro.Berkeley.EDU (Paul Heckbert) writes:
|
|Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b,
|and c are 0 or 1, into the 8 bit number abcabcab.
|
|This is very close to the "correct" formula, but you've found a somewhat
|roundabout way to compute it.  The formula you want will map black (000)
|to black (00000000) and white (111) to white (11111111) and map everything
|inbetween linearly.  In other words, you want to multiply by 255/7.
|Your formula actually multiplies by 255.9375/7.

The point of my "round-about" method is performance.  It's much easier to
replicate bits and do shifts than to divide by 7.  I believe that this
approximation will yield the "correct" value (to 8-bit int precision)
for all cases, right?

|Dale also asked about algorithms for selecting the 16 colors out of a
|palette of 512 that best represent an image.  This is called "color image
|quantization".  I wrote about it in a paper:

Thank you for the references.  I'll check into them.

dave@onfcanim.UUCP (Dave Martindale) (12/17/88)

In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes:
>
>The point of my "round-about" method is performance.  It's much easier to
>replicate bits and do shifts than to divide by 7.

But it's faster to *implement* almost any function using table lookup.
Even the naive and inaccurate shift-n-bits is faster when performed using
table lookup than the hardware shift instruction on some hardware.

Once you are using table lookup to do the pixel-by-pixel "computations",
it really doesn't matter how expensive the code that initializes the table
is - you only do it once.  So you might as well use multiply and divide,
and do the calculations in a way that someone else can read, and can see
by inspection is correct.

You can even use non-linear pixel encodings to avoid losing shadow detail
when the output pixel is narrow.  For example, my standard way of storing
12-bit linear data from a scanner into 8 bits is:

	outpix = 255 * ((inpix/4095.0) ** (1/2.2))

using floating point where needed, and rounding the result to an integer.
(The magic number 2.2 happens to be the standard value of "gamma correction"
that the NTSC television standard uses, so this can be sent to a frame
buffer and turned into NTSC without further gamma correction, but the
technique is worthwhile on its own even if the image will never appear
on video.)

raveling@vaxb.isi.edu (Paul Raveling) (12/20/88)

In article <16929@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes:
>>
>>The point of my "round-about" method is performance.  It's much easier to
>>replicate bits and do shifts than to divide by 7.
>
>But it's faster to *implement* almost any function using table lookup.

	This is true for relatively complex functions, but not usually
	for those that break down easily to simple operations such as
	shifts and adds.  I've measured speed improvements up to a
	factor of 14 over ordinary C code in the most extreme case
	by moving a critical algorithm to assembly language and using
	this sort of shifty logic.
	
	Both techniques are valuable.  For example, real time software
	such as that used in Central Air Data Computers uses shift/subtract
	logic wherever possible for functions such as the simplest digital
	filters [something like filtered_value = (7*old_value+new_value)/8];
	it uses table lookup with linear interpolation between entries for
	other functions.  The other functions need not be very complex
	to make the table lookup useful -- sqrt, for example.

	Where table lookup REALLY shines is evaluating relatively
	complex functions.  Software such as the B-1B's CADC
	uses it profusely to keep adequate real time margins.

>Once you are using table lookup to do the pixel-by-pixel "computations",
>it really doesn't matter how expensive the code that initializes the table
>is - you only do it once.  So you might as well use multiply and divide,
>and do the calculations in a way that someone else can read, and can see
>by inspection is correct.

	Good commenting serves the latter purpose.  It's just as easy
	to supply an actual equation as a comment as it is to use
	it as code.  It's a matter of software engineering discipline
	to be sure the comments match the code -- we all do that,
	religiously, don't we?


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

dave@onfcanim.UUCP (Dave Martindale) (12/21/88)

In article <7086@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>>But it's faster to *implement* almost any function using table lookup.
>
>	This is true for relatively complex functions, but not usually
>	for those that break down easily to simple operations such as
>	shifts and adds.  I've measured speed improvements up to a
>	factor of 14 over ordinary C code in the most extreme case
>	by moving a critical algorithm to assembly language and using
>	this sort of shifty logic.

Is that a factor of 14 speed improvement over table lookup?  Or a
factor of 14 improvement over code that contained multiply or divide in
the inner loop?  A comparison against table lookup is what matters.
What processor was this on?

For shift/add to be faster than table lookup, the time required to do
the shifts and adds must be less than that required to do the table
addressing calculations and the extra memory fetch.  (Note that the
table word will usually be in the cache on machines that have a cache.)

On machines like the VAX, where the table addressing calculations can
be done as part of a "move" instruction but the shifts and adds are
done as separate instructions (and shift is very slow on some models),
table lookup is going to be faster.  On another machine where the table
addressing requires several instructions but a barrel shifter is
available, the shift method will likely be faster.  You have to try
both, or have a good knowledge of the particular model of the
particular architecture of processor you are using, to determine which
is faster.

However, table lookup has some additional benefits.  It is simple
enough that carefully-written C code is likely to generate the best
possible assembler code, so there is no need to use assembler.  A
single copy of the lookup code functions for all possible input and
output widths, as long as the pixels are always stored in words of a
constant width.

In contrast, the shift/add code requires different sequences of
instructions for different input and output pixel bit widths, since the
output may require adding 1 or 2 or 3 or more copies of the input,
shifted by various amounts.  Either you must pre-compile all possible
variations that you could ever need, or compile code during execution,
or just have a general-purpose algorithm that needs tests and branches
within the pixel lookup loop (bye-bye performance).  How do you deal
with this?

ksbooth@watcgl.waterloo.edu (Kelly Booth) (12/21/88)

In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>For shift/add to be faster than table lookup...

Actually, some solutions combine BOTH techniques.  If the table has two
or more indices (not the case here), then the lookup requires combining
those indices into a single index into the table.  Some compilers do
this with adds and multiplies.  If machine code is written (or if the
compiler is smart) shifts and adds can be substituted.  This may
require that the table size(s) be adjusted to the next larger powers of
two.

Or, recursively, the composite indices can themselves be computed using
a table lookup scheme.

raveling@vaxb.isi.edu (Paul Raveling) (12/22/88)

	BTW, in case it wasn't clear my preceding response was
	suggesting that the "shifty logic" and table lookup with
	interpolation alternatives are more appropriate when the
	function being implemented has a large domain.  Direct
	table lookup is best when the domain is relatively small,
	and I agree that it's the best technique for sheer speed.

	A case where direct table lookup isn't practical is any of
	a few image processing utilities we have.  They examine
	a 5x5 square of pixels with 24 bits of color per pixel
	to determine a new color for the center pixel.  I'd love to
	use direct table lookup for this, but 25*2**24 bytes is a
	tough table to handle.

In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes:
>
>Is that a factor of 14 speed improvement over table lookup?  Or a
>factor of 14 improvement over code that contained multiply or divide in
>the inner loop?  A comparison against table lookup is what matters.
>What processor was this on?

	This was several years ago; I believe the algorithm was
	an integer square root function, the processor definitely
	was either an 8088 or an 80286 on an IBM PC.  Since the
	function's domain was unsigned 16-bit integers and the PC's
	memory was quite limited, direct table lookup would have
	been impractical, even though it certainly would be much
	faster.

	I must admit that the factor of 14 is exceptional because of
	the 80x86 processor architecture;  going into assembly language
	allowed carefully recoding some branching logic that produced
	excessive queue flushes, which are a major time waster in this
	family of processors.  Of course a direct table lookup would
	eliminate this problem.


	Note also that there are two different types of table lookup
	to consider:  Direct table lookup and searching.  The variant
	I mentioned in connection with real time avionics software
	used tables containing both x and y values (as in y = f(x) --
	not screen coordinates) to handle continuous functions of
	real numbers.  It did a binary search to find the nearest
	x values to the argument, then used linear interpolation
	to derive the corresponding y value.  In order to support
	the required accuracy most tables contained around 6-10 entries;
	the shortest I can recall was 2 entries, longest about 50.
	The version of the B-1B CADC that I worked on used sets of
	these tables to implement up to 5-dimensional linear interpolation.
	
	This approach is VERY fast for a large class of functions;
	with a mean search requiring something like 3 lookups and
	only simple math, it's faster than computing for lots of things
	with large domains or complex [not simple] equations.  It's
	also good for supporting empirically derived functions, such as
	error correction for the static pressure source on military aircraft;
	this tends to be a bit bizarre in the transonic regime near mach 1.

	BTW, a necessary aid for building that sort of table is a
	utility to do curve fits, then extract a set of (x,y) points
	which keep error less than a given accuracy requirement when
	using linear interpolation between these points.

	Bottom line:  By all means use direct table lookup where it's
	feasible; but don't forget there are a couple other approaches
	that can still save time.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

turk@Apple.COM (Ken "Turk" Turkowski) (12/31/88)

A fast but crude method is:

Y = (R + 2G + B) / 4
--
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: Turkowski1

jbm@eos.UUCP (Jeffrey Mulligan) (01/04/89)

From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
> A fast but crude method is:
> 
> Y = (R + 2G + B) / 4
> --
> Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
> Internet: turk@apple.com
> Applelink: Turkowski1

Equally fast and crude but probably more accurate:

Y = 2R + 5G + B

-- 

	Jeff Mulligan (jbm@aurora.arc.nasa.gov)
	NASA/Ames Research Ctr., Mail Stop 239-3, Moffet Field CA, 94035
	(415) 694-6290

falk@sun.uucp (Ed Falk) (01/04/89)

In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
> > Y = (R + 2G + B) / 4
> Y = 2R + 5G + B

Yeesh, you people.  Why do you need to simplify it?  Are you going to be
doing these calculations by hand?  Get a calculator or something.

If you *must* do it in integer for speed reasons, do it this way:

    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/

The results are correct to four decimal places and the divide is replaced
by a right-shift in a decent compiler and a byte-move in a good compiler.

-- 
		-ed falk, sun microsystems
		 sun!falk, falk@sun.com
		 card-carrying ACLU member.

raveling@vaxb.isi.edu (Paul Raveling) (01/05/89)

In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes:
>In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
>> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
>> > Y = (R + 2G + B) / 4
>> Y = 2R + 5G + B

	[The latter should really be Y = (2R + 5G + B) / 8, right?]
>
>Yeesh, you people.  Why do you need to simplify it?  Are you going to be
>doing these calculations by hand?  Get a calculator or something.
>
>If you *must* do it in integer for speed reasons, do it this way:
>
>    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/
>
>The results are correct to four decimal places and the divide is replaced
>by a right-shift in a decent compiler and a byte-move in a good compiler.

	... because sometimes speed is lots more important than 4
	significant digits of accuracy, and multiplies are slow.

	Consider a 68020 running code for these operations:

	a)	Y = (R + G+G + B) >> 2
	b)	Y = (R+R + G<<2+G + B) >> 3
	c)	Y = (77*r + 151*g + 28*b) >> 8

	For a rough cut at comparing timings, adding up the number
	of clocks for each instruction for each of the 3 cases given
	in the gospel according to Motorola, assuming the work's done
	in registers and the result is stored with a (An)+  reference,
	gives:

			Best Case	Cache Case	Worst Case
			---------	----------	----------
	
	a)		   5		   14		    18
	b)		   6		   20		    25
	c)		  83		   93		    99


	Which makes the accurate variant a whale of a lot slower
	than the others.  This sort of thing gets fairly noticable
	if you're massaging a megapixel image.

	BTW, this is an example of a function that couldn't easily
	be accelerated by table lookup unless R, G, and B have very
	few bits.  Even then 3D subscript computation puts the table
	lookup in the same speed range as the faster 2 of these
	alternatives.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

ksbooth@watcgl.waterloo.edu (Kelly Booth) (01/05/89)

In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>	... because sometimes speed is lots more important than 4
>	significant digits of accuracy, and multiplies are slow.

. . . (stuff deleted) . . .

>	BTW, this is an example of a function that couldn't easily
>	be accelerated by table lookup unless R, G, and B have very
>	few bits.  Even then 3D subscript computation puts the table
>	lookup in the same speed range as the faster 2 of these
>	alternatives.

Huh?  Table look up can be used to replace each of the three multiplies
in aR+bG+cB so that the code becomes something like

	a[R]+b[G]+c[B]

if R-G-B are bytes (the usual case and what most of the previous postings
have assumed -- for up to 12 bits table look up is still reasonable).  This
leaves just the adds and the divide (not shown at the end), which for the
posting that suggest this was still a factor of two (in fact 256) so the
byte swap/move or shift tricks all still work.  There is no need to tabulate
the entire function.  [See previous postings on table look up in this
news group about 1-2 weeks ago.]

raveling@vaxb.isi.edu (Paul Raveling) (01/07/89)

In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>	Consider a 68020 running code for these operations:
>
>	a)	Y = (R + G+G + B) >> 2
>	b)	Y = (R+R + G<<2+G + B) >> 3
>	c)	Y = (77*r + 151*g + 28*b) >> 8
>
>	BTW, this is an example of a function that couldn't easily
>	be accelerated by table lookup unless R, G, and B have very
>	few bits.  Even then 3D subscript computation puts the table
>	lookup in the same speed range as the faster 2 of these
>	alternatives.


	It's time to eat some of my own words...  I admit to taking
	my brain out of gear too soon on this one.
	
	As Bob Webber pointed out in an email message, a good candidate
	for the best approach of all is likely to be:

	d)	Y = (times77[R] + times151[G] + times28[B]) >> 8


	If R, G, and B are 8 bits this only requires 768 bytes of
	table space and it should be about as fast as alternative b.
	This is easily worth it for having both good speed and good
	accuracy.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (01/10/89)

In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes:
|In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes:
|> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski):
|> > Y = (R + 2G + B) / 4
|> Y = 2R + 5G + B
|    out = (77*r + 151*g + 28*b)/256 ;	/* NTSC weights (.3,.59,.11)*/
|
|The results are correct to four decimal places and the divide is replaced
|by a right-shift in a decent compiler and a byte-move in a good compiler.

I don't know where you got your numbers.  The values I have for the Y
component of YIQ (luminance) from RGB are: R=.299 G=.587 B=.144
The following formula is the best approximation with 8-bit values:
  Y = (R*77 + G*150 + B*29) / 256
Which gives the weights: R=.3008 G=.5859 B=.1133,   total error=.0036
Your values give the weights: R=.3008 G=.5898 B=.1094,   total error=.0092
Even MY numbers don't have 4 places of accuracy, but they are a better
approximation to the 3 place target values I have.  Someone mentioned that
the NTSC weight may have been changed recently, is that so?

PS.  I fully agree with the idea that more accurate values should be used
if you're going to use integer, and do 3 multiplies and a 'divide' (which
can be optimized if it's a power of 2) anyway.

pokey@well.UUCP (Jef Poskanzer) (01/13/89)

I wrote a quick test program to try out various approximations.  It runs
five million conversions.  On a Sun 3/260, the timings are:

    float:  223.0
    int:     35.4
    table:   31.6

I have appended the program, in case anyone wants to run it on a different
architecture or try different approximations.
---
Jef

             Jef Poskanzer   jef@rtsg.ee.lbl.gov   ...well!pokey
         "Thank God, I have done my duty." -- Admiral Horatio Nelson

/* t.c
**
** To use:
**   cc -O -DFLOAT t.c -s -o t1 ; time ./t1
**   cc -O -DINT t.c -s -o t2 ; time ./t2
**   cc -O -DTABLE t.c -s -o t3 ; time ./t3
*/

#include <stdio.h>

main( )
    {
    int i, j, r, g, b;
#ifdef TABLE
    static int times77[256] = {
	0, 77, 154, 231, 308, 385, 462, 539,
	616, 693, 770, 847, 924, 1001, 1078, 1155,
	1232, 1309, 1386, 1463, 1540, 1617, 1694, 1771,
	1848, 1925, 2002, 2079, 2156, 2233, 2310, 2387,
	2464, 2541, 2618, 2695, 2772, 2849, 2926, 3003,
	3080, 3157, 3234, 3311, 3388, 3465, 3542, 3619,
	3696, 3773, 3850, 3927, 4004, 4081, 4158, 4235,
	4312, 4389, 4466, 4543, 4620, 4697, 4774, 4851,
	4928, 5005, 5082, 5159, 5236, 5313, 5390, 5467,
	5544, 5621, 5698, 5775, 5852, 5929, 6006, 6083,
	6160, 6237, 6314, 6391, 6468, 6545, 6622, 6699,
	6776, 6853, 6930, 7007, 7084, 7161, 7238, 7315,
	7392, 7469, 7546, 7623, 7700, 7777, 7854, 7931,
	8008, 8085, 8162, 8239, 8316, 8393, 8470, 8547,
	8624, 8701, 8778, 8855, 8932, 9009, 9086, 9163,
	9240, 9317, 9394, 9471, 9548, 9625, 9702, 9779,
	9856, 9933, 10010, 10087, 10164, 10241, 10318, 10395,
	10472, 10549, 10626, 10703, 10780, 10857, 10934, 11011,
	11088, 11165, 11242, 11319, 11396, 11473, 11550, 11627,
	11704, 11781, 11858, 11935, 12012, 12089, 12166, 12243,
	12320, 12397, 12474, 12551, 12628, 12705, 12782, 12859,
	12936, 13013, 13090, 13167, 13244, 13321, 13398, 13475,
	13552, 13629, 13706, 13783, 13860, 13937, 14014, 14091,
	14168, 14245, 14322, 14399, 14476, 14553, 14630, 14707,
	14784, 14861, 14938, 15015, 15092, 15169, 15246, 15323,
	15400, 15477, 15554, 15631, 15708, 15785, 15862, 15939,
	16016, 16093, 16170, 16247, 16324, 16401, 16478, 16555,
	16632, 16709, 16786, 16863, 16940, 17017, 17094, 17171,
	17248, 17325, 17402, 17479, 17556, 17633, 17710, 17787,
	17864, 17941, 18018, 18095, 18172, 18249, 18326, 18403,
	18480, 18557, 18634, 18711, 18788, 18865, 18942, 19019,
	19096, 19173, 19250, 19327, 19404, 19481, 19558, 19635 };
    static int times150[256] = {
	0, 150, 300, 450, 600, 750, 900, 1050,
	1200, 1350, 1500, 1650, 1800, 1950, 2100, 2250,
	2400, 2550, 2700, 2850, 3000, 3150, 3300, 3450,
	3600, 3750, 3900, 4050, 4200, 4350, 4500, 4650,
	4800, 4950, 5100, 5250, 5400, 5550, 5700, 5850,
	6000, 6150, 6300, 6450, 6600, 6750, 6900, 7050,
	7200, 7350, 7500, 7650, 7800, 7950, 8100, 8250,
	8400, 8550, 8700, 8850, 9000, 9150, 9300, 9450,
	9600, 9750, 9900, 10050, 10200, 10350, 10500, 10650,
	10800, 10950, 11100, 11250, 11400, 11550, 11700, 11850,
	12000, 12150, 12300, 12450, 12600, 12750, 12900, 13050,
	13200, 13350, 13500, 13650, 13800, 13950, 14100, 14250,
	14400, 14550, 14700, 14850, 15000, 15150, 15300, 15450,
	15600, 15750, 15900, 16050, 16200, 16350, 16500, 16650,
	16800, 16950, 17100, 17250, 17400, 17550, 17700, 17850,
	18000, 18150, 18300, 18450, 18600, 18750, 18900, 19050,
	19200, 19350, 19500, 19650, 19800, 19950, 20100, 20250,
	20400, 20550, 20700, 20850, 21000, 21150, 21300, 21450,
	21600, 21750, 21900, 22050, 22200, 22350, 22500, 22650,
	22800, 22950, 23100, 23250, 23400, 23550, 23700, 23850,
	24000, 24150, 24300, 24450, 24600, 24750, 24900, 25050,
	25200, 25350, 25500, 25650, 25800, 25950, 26100, 26250,
	26400, 26550, 26700, 26850, 27000, 27150, 27300, 27450,
	27600, 27750, 27900, 28050, 28200, 28350, 28500, 28650,
	28800, 28950, 29100, 29250, 29400, 29550, 29700, 29850,
	30000, 30150, 30300, 30450, 30600, 30750, 30900, 31050,
	31200, 31350, 31500, 31650, 31800, 31950, 32100, 32250,
	32400, 32550, 32700, 32850, 33000, 33150, 33300, 33450,
	33600, 33750, 33900, 34050, 34200, 34350, 34500, 34650,
	34800, 34950, 35100, 35250, 35400, 35550, 35700, 35850,
	36000, 36150, 36300, 36450, 36600, 36750, 36900, 37050,
	37200, 37350, 37500, 37650, 37800, 37950, 38100, 38250 };
    static int times29[256] = {
	0, 29, 58, 87, 116, 145, 174, 203,
	232, 261, 290, 319, 348, 377, 406, 435,
	464, 493, 522, 551, 580, 609, 638, 667,
	696, 725, 754, 783, 812, 841, 870, 899,
	928, 957, 986, 1015, 1044, 1073, 1102, 1131,
	1160, 1189, 1218, 1247, 1276, 1305, 1334, 1363,
	1392, 1421, 1450, 1479, 1508, 1537, 1566, 1595,
	1624, 1653, 1682, 1711, 1740, 1769, 1798, 1827,
	1856, 1885, 1914, 1943, 1972, 2001, 2030, 2059,
	2088, 2117, 2146, 2175, 2204, 2233, 2262, 2291,
	2320, 2349, 2378, 2407, 2436, 2465, 2494, 2523,
	2552, 2581, 2610, 2639, 2668, 2697, 2726, 2755,
	2784, 2813, 2842, 2871, 2900, 2929, 2958, 2987,
	3016, 3045, 3074, 3103, 3132, 3161, 3190, 3219,
	3248, 3277, 3306, 3335, 3364, 3393, 3422, 3451,
	3480, 3509, 3538, 3567, 3596, 3625, 3654, 3683,
	3712, 3741, 3770, 3799, 3828, 3857, 3886, 3915,
	3944, 3973, 4002, 4031, 4060, 4089, 4118, 4147,
	4176, 4205, 4234, 4263, 4292, 4321, 4350, 4379,
	4408, 4437, 4466, 4495, 4524, 4553, 4582, 4611,
	4640, 4669, 4698, 4727, 4756, 4785, 4814, 4843,
	4872, 4901, 4930, 4959, 4988, 5017, 5046, 5075,
	5104, 5133, 5162, 5191, 5220, 5249, 5278, 5307,
	5336, 5365, 5394, 5423, 5452, 5481, 5510, 5539,
	5568, 5597, 5626, 5655, 5684, 5713, 5742, 5771,
	5800, 5829, 5858, 5887, 5916, 5945, 5974, 6003,
	6032, 6061, 6090, 6119, 6148, 6177, 6206, 6235,
	6264, 6293, 6322, 6351, 6380, 6409, 6438, 6467,
	6496, 6525, 6554, 6583, 6612, 6641, 6670, 6699,
	6728, 6757, 6786, 6815, 6844, 6873, 6902, 6931,
	6960, 6989, 7018, 7047, 7076, 7105, 7134, 7163,
	7192, 7221, 7250, 7279, 7308, 7337, 7366, 7395 };
#endif TABLE

    r = g = b = 0;
    for ( i = 0; i < 5000000; i++ )
	{
#ifdef FLOAT
	j = (int) ( 0.299 * r + 0.587 * g + 0.114 * b + 0.5 );
#endif FLOAT
#ifdef INT
	j = ( r * 77 + g * 150 + b * 29 ) >> 8;
#endif INT
#ifdef TABLE
	j = ( times77[r] + times150[g] + times29[b] ) >> 8;
#endif TABLE
	r = ( r + 3 ) & 0xff;
	g = ( g + 5 ) & 0xff;
	b = ( b + 7 ) & 0xff;
	}

    exit( 0 );
    }

jef@ace.ee.lbl.gov (Jef Poskanzer) (01/14/89)

In the referenced message, I wrote:
}    float:  223.0
}    int:     35.4
}    table:   31.6

Oh yeah, almost forgot the null case - no luminance calculation, just
the wrapper program:

    null:    16.2

raveling@vaxb.isi.edu (Paul Raveling) (01/14/89)

In article <10322@well.UUCP> Jef Poskanzer <jef@rtsg.ee.lbl.gov> writes:
>I wrote a quick test program to try out various approximations.  It runs
>five million conversions.  On a Sun 3/260, the timings are:
>
>    float:  223.0
>    int:     35.4
>    table:   31.6
>
>I have appended the program, in case anyone wants to run it on a different
>architecture or try different approximations.

	Just below are some results from an HP 9000/350.  I added
	two runs: One was with "shifty" logic defined by...

#ifdef SHIFTY
	j = ( r+r + (g<<2)+g + b ) >> 3;
#endif

	The other was a "no logic" run, with nothing defined to
	get an overhead calibration (how much time the loop logic and
	rgb updating used).  The "Less Overhead" column below subtracts
	this to get a direct comparison of timing for the math only.

	Test		Raw Timing	Less Overhead
	----		----------	-------------

	float		  220.0		    201.2
	int		   37.6		     18.8
	table		   34.9		     16.1
	shifty		   28.9		     10.1
	overhead	   18.8		      0


	This isn't entirely what I anticipated:  The "int" version,
	(j = ( r * 77 + g * 150 + b * 29 ) >> 8;), appeared to be
	faster than expected.  I checked further and found that
	on this one the compiler decomposed all three multiplies
	into shifts, adds, and a subtract.

	Also, the table version seemed too slow.  It turned out
	that the compiler generated some remarkably crummy code.
	ALL data except the tables were kept on the stack -- none
	in registers -- and the subscript address computations
	appeared to be distinctly suboptimal.

	Next, maybe tomorrow, I'll try the same stuff with some
	hand coded assembly language.  It should be easy to beat
	the compiler by LOTS.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

falk@sun.uucp (Ed Falk) (01/15/89)

> >	BTW, this is an example of a function that couldn't easily
> >	be accelerated by table lookup unless R, G, and B have very
> >	few bits.  Even then 3D subscript computation puts the table
> >	lookup in the same speed range as the faster 2 of these
> >	alternatives.
> 
> Huh?  Table look up can be used to replace each of the three multiplies
> in aR+bG+cB so that the code becomes something like
> 
> 	a[R]+b[G]+c[B]
> 

I'm embarrassed.  I had been doing (r*77 + g*150 + 29*b)/256 all along
thinking that all multiplies took the same amount of time (the compiler,
it turns out, optimizes constant multiplies in interesting ways).

I've switched all my code to use look-up tables now.  I've gained a new
respect for look-up tables.


-- 
		-ed falk, sun microsystems
		 sun!falk, falk@sun.com
		 card-carrying ACLU member.

raveling@vaxb.isi.edu (Paul Raveling) (01/17/89)

In article <7266@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes:
>
>	Next, maybe tomorrow, I'll try the same stuff with some
>	hand coded assembly language.  It should be easy to beat
>	the compiler by LOTS.
>

	Here's the result of using hand coded assembly language
	for the "table" algorithm:

	Test		Raw Timing	Less Overhead
	----		----------	-------------

	C version	   34.9		     16.1
	Assembly version   10.2		      6.4


	Anyone have a better C compiler?

	If anyone would like to check this hacked assembly version
	on other systems, let me know.  I can either email it or
	post it, but should take a few minutes to clean the source
	up a little & stick in warnings that very few 68K assemblers
	use identically the same source syntax.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu

jonathan@jvc.UUCP (Jonathan Hue) (01/18/89)

I'm slightly puzzled by these calculations of luminance from RGB.  Doesn't
the formula Y = .299R + .587G + .114B only apply when RGB represents
intensity, rather than pixel values?  If your pixel values are completely
gamma-corrected through look-up tables in your frame buffer hardware so
they represent intensities, this would work, but if you use linear look-up
tables (or don't have any), you would need to convert pixel values to
intensity, calculate luminance, then convert them back into pixel values
(voltages).

Also, considering how far the green of the typical color monitor is from
NTSC green, it may be worth deriving new coefficients for the monitor you
are using.


Jonathan Hue		uunet!jvc!jonathan