poynton%vector@Sun.COM (Charles Poynton) (11/06/88)
In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R. Swick <swick@ATHENA.MIT.EDU> comments: > When converting RGB values to monochrome, the sample server(s) compute > an intensity value as (.39R + .5G + .11B) ... (.39R + .5G + .11B) is apparently incorrect. This set could be a typographical error (from .29R + .6G + .11B ?), a liberal approximation, or perhaps an unusual phosphor set. Could someone enlighten me on this? In followup article <8811042303.AA21505@dawn.steinmetz.GE.COM>, Dick St.Peters <stpeters@dawn.UUCP> makes the statement: > I'd like to suggest that (.39R + .5G + .11B) is not a good choice for > "intensity" in the realm of computer graphics. ... > > A better choice in computer graphics is to equally weight the colors: > ((R+G+B)/3.0). Let white be white. Equal weighting of the primaries is NOT the right thing to do, unless the viewers of your images are members of some species that has uniform response across the visible spectrum, unlike homo sapiens. Humans see 1 watt of green light energy as being somewhat brighter than 1 watt of red, and very much brighter than 1 watt of blue. The science of colourimetry began to flourish in 1931, when the CIE standardized a statistical entity called the "standard observer". This includes a standard spectral luminance response defined numerically as a function of wavelength. It is from this data that the factors which are used in colour television are derived: .587 for green, .299 for red, and .114 for blue. The particular factors depend on the wavelengths or chromaticities that you call red, green, and blue: there is wide disparity in these choices. For computer graphics and television, the luminance factors depend on the chromaticity coordinates of the phosphors of your CRT. There are compromises in the choice of phosphor primaries, but it turns out that the NTSC did a spectacularly good job of selecting primaries. The luminance coefficients 0.299 for red, 0.114 for blue, and 0.587 for green are unquestionably the best values to use, unless you know your phoshphors intimately. The second article continues, > The formula is from > the (1954) NTSC standard for compatible color TV, and it has built > into it a lot of compromises to accommodate old technology and > problems inherent in the analog transmission of composite color > television. Contrary to this assertion, the ONLY compromise in NTSC which impacts the luminance equation is the choice of reference phosphor chromaticities, and a choice of phosphors MUST be made for any system which transmits colour in RGB. Just because it's old (1954) doesn't mean we should throw it away. Charles Poynton "No quote. poynton@sun.com No disclaimer." (415)336-7846
poynton%vector@Sun.COM (Charles Poynton) (11/09/88)
This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU> of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for luminance. Summary Luminance weighting coefficients are a function ONLY of the human luminance sensitivity functions, the selected phosphor chromaticities, and the chromaticity of the reference illuminant (white point). The equations can be found on page 2.28 and on in K. Blair Benson's "Television Engineering Handbook", or on page 20-10 of Donald G. Fink's Electronic Engineers' Handbook, Second Edition". A Simple Thought Experiment Think of three monochromatic primaries, for example, tuneable dye lasers. If (R+G+B)/3 is white, what happens when I move the wavelength of the blue primary? If I move it towards the ultraviolet, the eye responds less to it and luminance drops. If I move it towards green, the eye responds more (in which channel(s) we don't care) and its luminance increases. Hence, the equation for luminance must depend in some manner on what wavelength you call "blue". White CIE Illuminant D65 refers to a standard spectral distribution which is chosen to approximate daylight. This is television's standard white reference. So-called "Zero chroma" in NTSC occurs on this colour. People may or may not respond to this as "white": if the observer is adapted to a pink room, he'll think it's yellowish-green. The reason that the CIE decided to standardize white, in 1931, is to allow work in colourimetry to be done objectively. To achieve accurate colour reproduction requires specification of reference white. This is a lurking problem for accurate colour reproduction in computer graphics, because most workstation monitors are adjusted for a white point (of about 9300 K) that's quite a bit more blue than the television standard. Software people tend to say "R=G=B=1=white. Simple." But it's not that simple. Blue Shirts Sorry to air the laundry in public. Fabric whiteners work by adding materials which flouresce to convert ultraviolet light [to which the human eye is not sensitive] into visible light in the blue part of the spectrum [to which it is]. The shirt not only looks brighter than white, it IS brighter than white. But this has nothing to do with the luminance coefficients. With modern phosphors the blue luminances coefficient is actually tending to go DOWN. To produce accurate reproduction with European standard phosphors requires a blue luminance contribution of 0.071. This is a big colour problem for Europe, because they use the same luminance function as we do, and it's not exactly matched to their phosphors. They tweak their camera spectral sensitivities as a first-order fix to improve the colour reproduction accuracy. Orthogonality Luminance is luminance; chrominance is chrominance; and NTSC can be thought of as conveying Y, U, and V independently. This is true regardless of your interpretation of what these quantities represent. They're "orthogonal" provided that one can't be derived from the other two. Although there are some subtle signal-to-noise ratio considerations in television coding, this issue is independent of (or should I say orthogonal to) the choice of luminance coefficients. NTSC RF Modulation I stand corrected by Mr St Peters on an RF modulation point: there IS one compromise made in NTSC transmission via RF. White modulates the RF carrier to 12.5%, and black modulates it to 70.3125%. When NTSC modulates an RF carrier, chroma excursions near highly saturated yellow and near highly saturated cyan are clipped to 120 IEEE units prior to the modulator, to avoid UNDER-modulating the transmitter. Two small regions of colour space are lost in this case. No practical television camera has sufficient colour separation capability to generate signals in these regions, but electronically-synthesized colour bars have perfect colour saturation and would undermodulate the transmitter if left alone. Studio colour bars are generated at 75% saturation to avoid these two regions of colour space. This issue does not bear on the choice of luminance coefficients, and is relevant to broadcast only. VHS machines and NTSC monitors reproduce all of the NTSC colour space even at 100% saturation. I would have put this first, but then you wouldn't have read all the colour stuff, would you? Charles
andru@rhialto.SGI.COM (Andrew Myers) (11/09/88)
In article <76353@sun.uucp>, poynton%vector@Sun.COM (Charles Poynton) writes: > In Comp.windows.x article <8811011523.AA02242@LYRE.MIT.EDU>, Ralph R. > Swick <swick@ATHENA.MIT.EDU> comments: > > > I'd like to suggest that (.39R + .5G + .11B) is not a good choice for > > "intensity" in the realm of computer graphics. ... > > > > A better choice in computer graphics is to equally weight the colors: > > ((R+G+B)/3.0). Let white be white. > > Equal weighting of the primaries is NOT the right thing to do, unless the > viewers of your images are members of some species that has uniform > response across the visible spectrum, unlike homo sapiens. > From Foley and Van Dam, page 613, we have the relationship between the YIQ and RGB color systems: Y 0.30 0.59 0.11 R I = 0.60 -0.28 -0.32 . G Q 0.21 -0.52 0.31 B From this, we can see that the luminous reponse of the eye (Y) is exactly that described by Mr. Poynton, albeit with less precision. What the other poster was being confused by was the *inverse* of this process. That is, when I,Q=0 (white light), we have R=G=B=Y. This can easily be seen from the inverse matrix: (Again, with 2 digits of precision) R 1.00 0.95 0.62 Y G = 1.00 -0.28 -0.64 . I B 1.00 -1.11 1.73 Q The fact that Y = 0.30R + 0.59G + 0.31B has little to do with white being made up of equal components of red, green, and blue. Or at least the connection is through a matrix inverse, unintuitive at best. Hope this clarifies more than it confuses. Andrew
awpaeth@watcgl.waterloo.edu (Alan Wm Paeth) (11/10/88)
In article <76649@sun.uucp> poynton%vector@Sun.COM (Charles Poynton) writes: >This follows Dick St Peters' comment <8811080030.AA14681@EXPO.LCS.MIT.EDU> >of Comp.windows.x that (R+G+B)/3 is an appropriate weighting function for >luminance. Thank you, Charles for a very informative article. I (for one) I'm getting tired of hearing misinformation such the above comment or related statements such as "[CMY] = 1-[RGB]". It's nice to see the record being set straight. (This is not a criticism on the original poster). These color "facts" have a feeling of common sense (as 0th order approximations to reality) which explains why they are deduced from first principles and then readily requoted. A bit like teaching Newtonian physics without being reminded that Relativity exists. Fortunately, the latest Computer Graphics texts are beginning to give color the more precise treatment it deserves. Some additional fine points which might be of interest to people: >White > >To achieve accurate colour reproduction requires specification of >reference white. This is a lurking problem for accurate colour >reproduction in computer graphics, because most workstation monitors are >adjusted for a white point (of about 9300 K) that's quite a bit more blue >than the television standard. Software people tend to say >"R=G=B=1=white. Simple." But it's not that simple. When people say "RGB color" my first impressions are (1) color used in the context of a digital computer and (2) (all to often) an associated sense vagueness about the exactness of the specification ("well, it's 24 bit RGB?!"). For instance, R might relate to the phosphor chromaticity of a specific monitor, a spectral line, or the NTSC defined value (I've yet to see an NTSC monitor that reproduces this). In our lab I've encountered many "R"'s: Chromaticity Coords Comments ------------------------------------------------- x' = .62, y' = .21 Electrohome monitor red phosphor x' = .65, y' = .30 Aydin monitor red phosphor x' = .63, y' = .30 Spectral line at 600 Angstroms (CIE tables) x' = .67, y' = .21 NTSC defined "Red" The use of the NTSC standard (with which the CIE tables for Y and a matrix inversion give the familiar Y = .299B + .587G + .114B) is far better than using Y = 1/3(R+G+B), but even then, I've yet to see an NTSC monitor. The first color TV's tried to live up to the standard, but the required red spectral purity is so high (that's the x' = .67 in the table) that luminance suffers. So the industry pushes brighter, less pure red phosphors (remember the "rare earth phosphor" ads of the early 70's?) and it's anyone's guess where the R coordinates are. Unless you have a lot of money, studio monitors tend to migrate in the direction of their commercial counterparts (lower spectral purity in the phosphors). > >Orthogonality > >Luminance is luminance; chrominance is chrominance; and NTSC can be >thought of as conveying Y, U, and V independently. This is true >regardless of your interpretation of what these quantities represent. >They're "orthogonal" provided that one can't be derived from the other >two. Although there are some subtle signal-to-noise ratio considerations >in television coding, this issue is independent of (or should I say >orthogonal to) the choice of luminance coefficients. Well, that's really a definition for independence, not complete orthogonality. The YIQ television signal is similar to the CIE defined YUV (or XYZ) color spaces in that the Y's (luminance) are the same. The I and Q chromanence signals pick up the remaining two degrees of freedom. The matrix that defines the coordinate change was chosen out of bandwidth considerations. In fact, I and Q stand for "In phase" and "Quadrature" signal. They are encoded for broadcast on color subcarriers that are 90 degrees out of phase. It is these signals that are orthogonal (in the sin(x), cos(x) sense), but not the independent values which they encode. >NTSC RF Modulation > >I stand corrected by Mr St Peters on an RF modulation point: there IS one >compromise made in NTSC transmission via RF...When NTSC modulates >an RF carrier, chroma excursions near highly saturated yellow and near >highly saturated cyan are clipped to 120 IEEE units prior to the >modulator, to avoid UNDER-modulating the transmitter. Two small regions >of colour space are lost in this case. No practical television camera has >sufficient colour separation capability to generate signals in these >regions, but electronically-synthesized colour bars have perfect colour >saturation and would undermodulate the transmitter if left alone. Almost. There are stories of independent stations with substandard power supplies that got fratzed when trying to run _Sesame Street_ clips -- Big Bird is both big and yellow (the AM video signal draws power as a function of the modulation). As for the generation of "synthetic" colors as with color bar tests, one has to be careful. I worked on Shoup's "Superpaint" system when with Xerox (one of the first color "paint" systems. It provided NTSC output). It featured a menu item to test for "hot" broadcast colors such as yellow to avoid modulation problems -- it would blink the colors. In that case one had either to reduce the intensity or desaturate. A tool like this is useful and necessary for readying computer graphics images for commercial broadcast. Those images *invariably* have highly saturated colors (a generation raised on Big Bird?). If there is enough interest I can post a production tuned C program which flags "hot" colors. /Alan Paeth Computer Graphics Laboratory University of Waterloo
mab@pixar.UUCP (Malcolm Blanchard) (11/12/88)
The discussion of luminance computations and the subsequent discussion of the meaning of white reminds me of an experience I had a few years ago when Pixar was a division of Lucasfilm and we were working on an effect for "Young Sherlock Holmes". Aesthetic decisions were being made by people sitting in front of color monitors. The digital images were transferred to film using three color lasers. The film was printed and then projected in a screening room. I decided that this was an great place to implement a what-you-see-is-what-you-get color system. And so I delved into the murky depths of colorimetry in the hope of developing a color-correction program that would produce the same color in the screening room that was measured on the color monitors. This a difficult problem (in fact, in its strictest sense, an impossible one, since the color gamuts of the two systems have mutually exclusive regions). I took into account the CIE coordinates of the monitor's phosphors, its color balance, the sensitivity of the color film to each of the lasers, cross-talk between the film layers, effects of film processing, the spectral characteristics of the print film's dye layers, and the spectral characteristics of a standard projector bulb. Several steps in this process are extremely non-linear, but I was able to achieve some good results by using some piece-wise linear approximations. I felt a great sense of success when I used a colormeter to confirm that the CIE coordinates on the silver screen did, indeed, closely match those on the tiny screen. We color corrected a few shots and showed them to the effects director. He's response was, "Why does this look so blue"? It turns out that when we look at a TV we're accustomed to a blue balance and when we're sitting in a theater we expect a yellow balance. The digital color correction was abandoned and the production relied on the film lab to produce an aesthetic balance. Thus proving to me that science may work, but computer graphics and film making are still largely a matter of art.
karlton@decwrl.dec.com (Philip Karlton) (12/06/88)
I found the entire discussion on luminance last month quite interesting. What I would like now from the experts is what they would do to convert an RGB value as specified in the X protocol into a gray level. The particular problem I have in mind is that of a StaticGray display with N equally spaced intensities arranged in a ramp with black at 0 and white at N-1. I have access to hardware with N of 2, 4, 16, and 256. Two different expressions for computing the appropriate pixel value come immediately to (my) mind: For r, g, b in [0..1] floor((.299r + .587g + .114b)(n - 1) + 0.5) (a) or for r, g, b in [0..1) floor((.299r + .587g + .114b)(n)) (b) (a) and (b) produce almost identical results for N=2. For N=256, the resulting differences are probably not detectable by the human eye, certainly not mine. For N=4, the differences are observable. The correct thing is for the client to have done the appropriate dithering and present the pixmap to the server. For those clients that ignore the visual type of the root window, the server has to do the mapping of RGB (in X's terms) to some pixel value. Is either (a) or (b) the appropriate choice. Is some better function around that I should use? For the numerically curious: r, g, and b above could be computed using r = ((float) screenRed) / maxColor; b = ((float) screenBlue) / maxColor; g = ((float) screenGreen) / maxColor; where maxColor is dependent upon which of (a) or (b) is chosen: float maxColor = (float) (0xFFFF); /* (a) */ or float maxColor = (float) (0x10000); /* (b) */ PK
srneely@watcgl.waterloo.edu (Shawn Neely) (12/07/88)
In article <964@bacchus.dec.com> karlton@decwrl.dec.com (Philip Karlton) writes:
+I would like now from the experts is what they would do to convert an RGB
+value as specified in the X protocol into a gray level.
+
+The particular problem I have in mind is that of a StaticGray display with N
+equally spaced intensities arranged in a ramp with black at 0 and white at
+N-1. I have access to hardware with N of 2, 4, 16, and 256.
+
+Two different expressions for computing the appropriate pixel value come
+immediately to (my) mind:
+
+For r, g, b in [0..1]
+
+ floor((.299r + .587g + .114b)(n - 1) + 0.5) (a)
+
+or for r, g, b in [0..1)
+
+ floor((.299r + .587g + .114b)(n)) (b)
+
+ (stuff deleted)
+...Is either (a) or (b) the appropriate choice. Is
+some better function around that I should use?
+
+For the numerically curious: r, g, and b above could be computed using
+
+ r = ((float) screenRed) / maxColor;
+ b = ((float) screenBlue) / maxColor;
+ g = ((float) screenGreen) / maxColor;
+
+where maxColor is dependent upon which of (a) or (b) is chosen:
+
+ float maxColor = (float) (0xFFFF); /* (a) */
+or
+ float maxColor = (float) (0x10000); /* (b) */
+
+PK
The correct approach is (a). A strong argument for using
the closed interval [0..1] (and [0..N-1]) is given in
"Design and Experience with a Generalized Raster Toolkit"
by Paeth and Booth in Proc. Graphics Interface '86 (Vancouver).
The interval is consistent with the design of a number of colour spaces,
and is correct when data is taken to higher significance. The same is
not true of the wrong (often implicit using bit shifts) use of
the open interval [0..1).
For example, a one-bit image in the [0..1) model allows only
the intensity values 0.0 and 0.5, and not "full on".
The required multiplications and divisions for the correct approach
can often be performed by table lookup.
--
(.I.) "The road of excess leads
).( to the palace of wisdom."
( Y ) -William Blake
dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/10/88)
In article <7162@watcgl.waterloo.edu> srneely@watcgl.waterloo.edu (Shawn Neely) writes: [...discussion of RGB->Grey conversion and "shifting" to higher precision...] | |The interval is consistent with the design of a number of colour spaces, |and is correct when data is taken to higher significance. The same is |not true of the wrong (often implicit using bit shifts) use of |the open interval [0..1). | |For example, a one-bit image in the [0..1) model allows only |the intensity values 0.0 and 0.5, and not "full on". | |The required multiplications and divisions for the correct approach |can often be performed by table lookup. Over the last couple of weeks I've been working with the PBM code posted to (i think) comp.source.misc, expanding it to handle 8-bit greyscale and 24-bit color images. In the process, I've run into some of the problems being discussed here. Following are some of my solutions. I think they work as well or better than what I've seen posted so far, but if I'm missing some glaring deficiency, I'd like to have it pointed out to me. RGB to Greyscale: I use the formula GREY = ((76 * R) + (150 * G) + (29 * B)) >> 8; where R, G and B are 24-bit color components [0..255] and the result is a greyscale value [0..255] with intermediate values in the formula not exceeding 16 bits (ie. int). This gives a good approximation of the 29.9% red, 58.7% green, 11.4% blue luminence contributions. Extrapolation to more significance: I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those color values to their 8-bit equivalents. As was mentioned, simply doing a left-shift is not sufficient. The method I use is to start at the MSB of the source and destination values, copy bits from the source proceeding toward the LSB, if you reach the end of the source before filling the destination, start over at the beginning of the source. This works for both imcreasing and decreasing significance (equivalent to right-shift for decreasing). Example: 101 --> 10110110, 000->00000000, 111->11111111, etc. It seems to work for all cases, even wierd things like 7-bits -> 13-bits. One problem I have yet to solve it analyzing a picture to choose N colors out of a (larger) palette of M colors that best represent a given image. For example, I can only use 16-colors out of a palette of 512, so which are the "best" 16 to use. I already have color dithering algorithms, but I need to decide which colors to dither WITH.
ph@miro.Berkeley.EDU (Paul Heckbert) (12/12/88)
Dale Schumacher (dal@midgard.Midgard.MN.ORG) wrote: > ...I have only 3-bits per gun (RGB) on the Atari-ST, and want to expand those > color values to their 8-bit equivalents... The method I use is to start at > the MSB of the source and destination values, copy bits from the source > proceeding toward the LSB, if you reach the end of the source before filling > the destination, start over at the beginning of the source. > This works for both imcreasing and decreasing significance (equivalent > to right-shift for decreasing). Example: 101 --> 10110110, > 000->00000000, 111->11111111, etc. It seems to work for all cases, > even wierd things like 7-bits -> 13-bits. Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b, and c are 0 or 1, into the 8 bit number abcabcab. This is very close to the "correct" formula, but you've found a somewhat roundabout way to compute it. The formula you want will map black (000) to black (00000000) and white (111) to white (11111111) and map everything inbetween linearly. In other words, you want to multiply by 255/7. Your formula actually multiplies by 255.9375/7. You can prove this to yourself by thinking of the 3-bit bit string x=abc as a representation for the binary fraction x'=.abc (e.g. bit string 010 represents the number .010) and 8-bit bit string y=abcabcab is a code for binary y'=.abcabcab . But replicating the bits is equivalent to a multiplication: y'=x'*1.001001001. Putting our formulas together, we have x'=x/8, y'=y/256, and 1.001001001=4095/3584, so y/x = (1/8)*(4095/(512*7))*256 = 4095/(7*16) = 255.9375/7 . It's good to step back from the low-level bits once in a while and think about what these pixel values mean in the real world. ---- Dale also asked about algorithms for selecting the 16 colors out of a palette of 512 that best represent an image. This is called "color image quantization". I wrote about it in a paper: Paul S. Heckbert, "Color Image Quantization for Frame Buffer Display", Computer Graphics (SIGGRAPH '82 Proceedings), vol. 16, no. 3, July 1982, pp. 297-307 see also the improved algorithm in: S. J. Wan, K. M. Wong, P. Prusinkiewicz, "An Algorithm for Multidimensional Data Clustering", ACM Trans. on Mathematical Software, vol. 14, no. 2, June 1988, 153-162 Paul Heckbert, CS grad student 508-7 Evans Hall, UC Berkeley UUCP: ucbvax!miro.berkeley.edu!ph Berkeley, CA 94720 ARPA: ph@miro.berkeley.edu
dal@midgard.Midgard.MN.ORG (Dale Schumacher) (12/14/88)
In article <8241@pasteur.Berkeley.EDU> ph@miro.Berkeley.EDU (Paul Heckbert) writes: | |Paraphrasing, Dale is convering the 3 bit number abc, where each of a, b, |and c are 0 or 1, into the 8 bit number abcabcab. | |This is very close to the "correct" formula, but you've found a somewhat |roundabout way to compute it. The formula you want will map black (000) |to black (00000000) and white (111) to white (11111111) and map everything |inbetween linearly. In other words, you want to multiply by 255/7. |Your formula actually multiplies by 255.9375/7. The point of my "round-about" method is performance. It's much easier to replicate bits and do shifts than to divide by 7. I believe that this approximation will yield the "correct" value (to 8-bit int precision) for all cases, right? |Dale also asked about algorithms for selecting the 16 colors out of a |palette of 512 that best represent an image. This is called "color image |quantization". I wrote about it in a paper: Thank you for the references. I'll check into them.
dave@onfcanim.UUCP (Dave Martindale) (12/17/88)
In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes: > >The point of my "round-about" method is performance. It's much easier to >replicate bits and do shifts than to divide by 7. But it's faster to *implement* almost any function using table lookup. Even the naive and inaccurate shift-n-bits is faster when performed using table lookup than the hardware shift instruction on some hardware. Once you are using table lookup to do the pixel-by-pixel "computations", it really doesn't matter how expensive the code that initializes the table is - you only do it once. So you might as well use multiply and divide, and do the calculations in a way that someone else can read, and can see by inspection is correct. You can even use non-linear pixel encodings to avoid losing shadow detail when the output pixel is narrow. For example, my standard way of storing 12-bit linear data from a scanner into 8 bits is: outpix = 255 * ((inpix/4095.0) ** (1/2.2)) using floating point where needed, and rounding the result to an integer. (The magic number 2.2 happens to be the standard value of "gamma correction" that the NTSC television standard uses, so this can be sent to a frame buffer and turned into NTSC without further gamma correction, but the technique is worthwhile on its own even if the image will never appear on video.)
raveling@vaxb.isi.edu (Paul Raveling) (12/20/88)
In article <16929@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: >In article <518@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes: >> >>The point of my "round-about" method is performance. It's much easier to >>replicate bits and do shifts than to divide by 7. > >But it's faster to *implement* almost any function using table lookup. This is true for relatively complex functions, but not usually for those that break down easily to simple operations such as shifts and adds. I've measured speed improvements up to a factor of 14 over ordinary C code in the most extreme case by moving a critical algorithm to assembly language and using this sort of shifty logic. Both techniques are valuable. For example, real time software such as that used in Central Air Data Computers uses shift/subtract logic wherever possible for functions such as the simplest digital filters [something like filtered_value = (7*old_value+new_value)/8]; it uses table lookup with linear interpolation between entries for other functions. The other functions need not be very complex to make the table lookup useful -- sqrt, for example. Where table lookup REALLY shines is evaluating relatively complex functions. Software such as the B-1B's CADC uses it profusely to keep adequate real time margins. >Once you are using table lookup to do the pixel-by-pixel "computations", >it really doesn't matter how expensive the code that initializes the table >is - you only do it once. So you might as well use multiply and divide, >and do the calculations in a way that someone else can read, and can see >by inspection is correct. Good commenting serves the latter purpose. It's just as easy to supply an actual equation as a comment as it is to use it as code. It's a matter of software engineering discipline to be sure the comments match the code -- we all do that, religiously, don't we? --------------------- Paul Raveling Raveling@vaxb.isi.edu
dave@onfcanim.UUCP (Dave Martindale) (12/21/88)
In article <7086@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes: > >>But it's faster to *implement* almost any function using table lookup. > > This is true for relatively complex functions, but not usually > for those that break down easily to simple operations such as > shifts and adds. I've measured speed improvements up to a > factor of 14 over ordinary C code in the most extreme case > by moving a critical algorithm to assembly language and using > this sort of shifty logic. Is that a factor of 14 speed improvement over table lookup? Or a factor of 14 improvement over code that contained multiply or divide in the inner loop? A comparison against table lookup is what matters. What processor was this on? For shift/add to be faster than table lookup, the time required to do the shifts and adds must be less than that required to do the table addressing calculations and the extra memory fetch. (Note that the table word will usually be in the cache on machines that have a cache.) On machines like the VAX, where the table addressing calculations can be done as part of a "move" instruction but the shifts and adds are done as separate instructions (and shift is very slow on some models), table lookup is going to be faster. On another machine where the table addressing requires several instructions but a barrel shifter is available, the shift method will likely be faster. You have to try both, or have a good knowledge of the particular model of the particular architecture of processor you are using, to determine which is faster. However, table lookup has some additional benefits. It is simple enough that carefully-written C code is likely to generate the best possible assembler code, so there is no need to use assembler. A single copy of the lookup code functions for all possible input and output widths, as long as the pixels are always stored in words of a constant width. In contrast, the shift/add code requires different sequences of instructions for different input and output pixel bit widths, since the output may require adding 1 or 2 or 3 or more copies of the input, shifted by various amounts. Either you must pre-compile all possible variations that you could ever need, or compile code during execution, or just have a general-purpose algorithm that needs tests and branches within the pixel lookup loop (bye-bye performance). How do you deal with this?
ksbooth@watcgl.waterloo.edu (Kelly Booth) (12/21/88)
In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: >For shift/add to be faster than table lookup... Actually, some solutions combine BOTH techniques. If the table has two or more indices (not the case here), then the lookup requires combining those indices into a single index into the table. Some compilers do this with adds and multiplies. If machine code is written (or if the compiler is smart) shifts and adds can be substituted. This may require that the table size(s) be adjusted to the next larger powers of two. Or, recursively, the composite indices can themselves be computed using a table lookup scheme.
raveling@vaxb.isi.edu (Paul Raveling) (12/22/88)
BTW, in case it wasn't clear my preceding response was suggesting that the "shifty logic" and table lookup with interpolation alternatives are more appropriate when the function being implemented has a large domain. Direct table lookup is best when the domain is relatively small, and I agree that it's the best technique for sheer speed. A case where direct table lookup isn't practical is any of a few image processing utilities we have. They examine a 5x5 square of pixels with 24 bits of color per pixel to determine a new color for the center pixel. I'd love to use direct table lookup for this, but 25*2**24 bytes is a tough table to handle. In article <16960@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: > >Is that a factor of 14 speed improvement over table lookup? Or a >factor of 14 improvement over code that contained multiply or divide in >the inner loop? A comparison against table lookup is what matters. >What processor was this on? This was several years ago; I believe the algorithm was an integer square root function, the processor definitely was either an 8088 or an 80286 on an IBM PC. Since the function's domain was unsigned 16-bit integers and the PC's memory was quite limited, direct table lookup would have been impractical, even though it certainly would be much faster. I must admit that the factor of 14 is exceptional because of the 80x86 processor architecture; going into assembly language allowed carefully recoding some branching logic that produced excessive queue flushes, which are a major time waster in this family of processors. Of course a direct table lookup would eliminate this problem. Note also that there are two different types of table lookup to consider: Direct table lookup and searching. The variant I mentioned in connection with real time avionics software used tables containing both x and y values (as in y = f(x) -- not screen coordinates) to handle continuous functions of real numbers. It did a binary search to find the nearest x values to the argument, then used linear interpolation to derive the corresponding y value. In order to support the required accuracy most tables contained around 6-10 entries; the shortest I can recall was 2 entries, longest about 50. The version of the B-1B CADC that I worked on used sets of these tables to implement up to 5-dimensional linear interpolation. This approach is VERY fast for a large class of functions; with a mean search requiring something like 3 lookups and only simple math, it's faster than computing for lots of things with large domains or complex [not simple] equations. It's also good for supporting empirically derived functions, such as error correction for the static pressure source on military aircraft; this tends to be a bit bizarre in the transonic regime near mach 1. BTW, a necessary aid for building that sort of table is a utility to do curve fits, then extract a set of (x,y) points which keep error less than a given accuracy requirement when using linear interpolation between these points. Bottom line: By all means use direct table lookup where it's feasible; but don't forget there are a couple other approaches that can still save time. --------------------- Paul Raveling Raveling@vaxb.isi.edu
turk@Apple.COM (Ken "Turk" Turkowski) (12/31/88)
A fast but crude method is: Y = (R + 2G + B) / 4 -- Ken Turkowski @ Apple Computer, Inc., Cupertino, CA Internet: turk@apple.com Applelink: Turkowski1
jbm@eos.UUCP (Jeffrey Mulligan) (01/04/89)
From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski): > A fast but crude method is: > > Y = (R + 2G + B) / 4 > -- > Ken Turkowski @ Apple Computer, Inc., Cupertino, CA > Internet: turk@apple.com > Applelink: Turkowski1 Equally fast and crude but probably more accurate: Y = 2R + 5G + B -- Jeff Mulligan (jbm@aurora.arc.nasa.gov) NASA/Ames Research Ctr., Mail Stop 239-3, Moffet Field CA, 94035 (415) 694-6290
falk@sun.uucp (Ed Falk) (01/04/89)
In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes: > From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski): > > Y = (R + 2G + B) / 4 > Y = 2R + 5G + B Yeesh, you people. Why do you need to simplify it? Are you going to be doing these calculations by hand? Get a calculator or something. If you *must* do it in integer for speed reasons, do it this way: out = (77*r + 151*g + 28*b)/256 ; /* NTSC weights (.3,.59,.11)*/ The results are correct to four decimal places and the divide is replaced by a right-shift in a decent compiler and a byte-move in a good compiler. -- -ed falk, sun microsystems sun!falk, falk@sun.com card-carrying ACLU member.
raveling@vaxb.isi.edu (Paul Raveling) (01/05/89)
In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes: >In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes: >> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski): >> > Y = (R + 2G + B) / 4 >> Y = 2R + 5G + B [The latter should really be Y = (2R + 5G + B) / 8, right?] > >Yeesh, you people. Why do you need to simplify it? Are you going to be >doing these calculations by hand? Get a calculator or something. > >If you *must* do it in integer for speed reasons, do it this way: > > out = (77*r + 151*g + 28*b)/256 ; /* NTSC weights (.3,.59,.11)*/ > >The results are correct to four decimal places and the divide is replaced >by a right-shift in a decent compiler and a byte-move in a good compiler. ... because sometimes speed is lots more important than 4 significant digits of accuracy, and multiplies are slow. Consider a 68020 running code for these operations: a) Y = (R + G+G + B) >> 2 b) Y = (R+R + G<<2+G + B) >> 3 c) Y = (77*r + 151*g + 28*b) >> 8 For a rough cut at comparing timings, adding up the number of clocks for each instruction for each of the 3 cases given in the gospel according to Motorola, assuming the work's done in registers and the result is stored with a (An)+ reference, gives: Best Case Cache Case Worst Case --------- ---------- ---------- a) 5 14 18 b) 6 20 25 c) 83 93 99 Which makes the accurate variant a whale of a lot slower than the others. This sort of thing gets fairly noticable if you're massaging a megapixel image. BTW, this is an example of a function that couldn't easily be accelerated by table lookup unless R, G, and B have very few bits. Even then 3D subscript computation puts the table lookup in the same speed range as the faster 2 of these alternatives. --------------------- Paul Raveling Raveling@vaxb.isi.edu
ksbooth@watcgl.waterloo.edu (Kelly Booth) (01/05/89)
In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes: > ... because sometimes speed is lots more important than 4 > significant digits of accuracy, and multiplies are slow. . . . (stuff deleted) . . . > BTW, this is an example of a function that couldn't easily > be accelerated by table lookup unless R, G, and B have very > few bits. Even then 3D subscript computation puts the table > lookup in the same speed range as the faster 2 of these > alternatives. Huh? Table look up can be used to replace each of the three multiplies in aR+bG+cB so that the code becomes something like a[R]+b[G]+c[B] if R-G-B are bytes (the usual case and what most of the previous postings have assumed -- for up to 12 bits table look up is still reasonable). This leaves just the adds and the divide (not shown at the end), which for the posting that suggest this was still a factor of two (in fact 256) so the byte swap/move or shift tricks all still work. There is no need to tabulate the entire function. [See previous postings on table look up in this news group about 1-2 weeks ago.]
raveling@vaxb.isi.edu (Paul Raveling) (01/07/89)
In article <7187@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes: > > Consider a 68020 running code for these operations: > > a) Y = (R + G+G + B) >> 2 > b) Y = (R+R + G<<2+G + B) >> 3 > c) Y = (77*r + 151*g + 28*b) >> 8 > > BTW, this is an example of a function that couldn't easily > be accelerated by table lookup unless R, G, and B have very > few bits. Even then 3D subscript computation puts the table > lookup in the same speed range as the faster 2 of these > alternatives. It's time to eat some of my own words... I admit to taking my brain out of gear too soon on this one. As Bob Webber pointed out in an email message, a good candidate for the best approach of all is likely to be: d) Y = (times77[R] + times151[G] + times28[B]) >> 8 If R, G, and B are 8 bits this only requires 768 bytes of table space and it should be about as fast as alternative b. This is easily worth it for having both good speed and good accuracy. --------------------- Paul Raveling Raveling@vaxb.isi.edu
dal@midgard.Midgard.MN.ORG (Dale Schumacher) (01/10/89)
In article <83604@sun.uucp> falk@sun.uucp (Ed Falk) writes: |In article <2263@eos.UUCP>, jbm@eos.UUCP (Jeffrey Mulligan) writes: |> From article <23105@apple.Apple.COM>, by turk@Apple.COM (Ken "Turk" Turkowski): |> > Y = (R + 2G + B) / 4 |> Y = 2R + 5G + B | out = (77*r + 151*g + 28*b)/256 ; /* NTSC weights (.3,.59,.11)*/ | |The results are correct to four decimal places and the divide is replaced |by a right-shift in a decent compiler and a byte-move in a good compiler. I don't know where you got your numbers. The values I have for the Y component of YIQ (luminance) from RGB are: R=.299 G=.587 B=.144 The following formula is the best approximation with 8-bit values: Y = (R*77 + G*150 + B*29) / 256 Which gives the weights: R=.3008 G=.5859 B=.1133, total error=.0036 Your values give the weights: R=.3008 G=.5898 B=.1094, total error=.0092 Even MY numbers don't have 4 places of accuracy, but they are a better approximation to the 3 place target values I have. Someone mentioned that the NTSC weight may have been changed recently, is that so? PS. I fully agree with the idea that more accurate values should be used if you're going to use integer, and do 3 multiplies and a 'divide' (which can be optimized if it's a power of 2) anyway.
pokey@well.UUCP (Jef Poskanzer) (01/13/89)
I wrote a quick test program to try out various approximations. It runs five million conversions. On a Sun 3/260, the timings are: float: 223.0 int: 35.4 table: 31.6 I have appended the program, in case anyone wants to run it on a different architecture or try different approximations. --- Jef Jef Poskanzer jef@rtsg.ee.lbl.gov ...well!pokey "Thank God, I have done my duty." -- Admiral Horatio Nelson /* t.c ** ** To use: ** cc -O -DFLOAT t.c -s -o t1 ; time ./t1 ** cc -O -DINT t.c -s -o t2 ; time ./t2 ** cc -O -DTABLE t.c -s -o t3 ; time ./t3 */ #include <stdio.h> main( ) { int i, j, r, g, b; #ifdef TABLE static int times77[256] = { 0, 77, 154, 231, 308, 385, 462, 539, 616, 693, 770, 847, 924, 1001, 1078, 1155, 1232, 1309, 1386, 1463, 1540, 1617, 1694, 1771, 1848, 1925, 2002, 2079, 2156, 2233, 2310, 2387, 2464, 2541, 2618, 2695, 2772, 2849, 2926, 3003, 3080, 3157, 3234, 3311, 3388, 3465, 3542, 3619, 3696, 3773, 3850, 3927, 4004, 4081, 4158, 4235, 4312, 4389, 4466, 4543, 4620, 4697, 4774, 4851, 4928, 5005, 5082, 5159, 5236, 5313, 5390, 5467, 5544, 5621, 5698, 5775, 5852, 5929, 6006, 6083, 6160, 6237, 6314, 6391, 6468, 6545, 6622, 6699, 6776, 6853, 6930, 7007, 7084, 7161, 7238, 7315, 7392, 7469, 7546, 7623, 7700, 7777, 7854, 7931, 8008, 8085, 8162, 8239, 8316, 8393, 8470, 8547, 8624, 8701, 8778, 8855, 8932, 9009, 9086, 9163, 9240, 9317, 9394, 9471, 9548, 9625, 9702, 9779, 9856, 9933, 10010, 10087, 10164, 10241, 10318, 10395, 10472, 10549, 10626, 10703, 10780, 10857, 10934, 11011, 11088, 11165, 11242, 11319, 11396, 11473, 11550, 11627, 11704, 11781, 11858, 11935, 12012, 12089, 12166, 12243, 12320, 12397, 12474, 12551, 12628, 12705, 12782, 12859, 12936, 13013, 13090, 13167, 13244, 13321, 13398, 13475, 13552, 13629, 13706, 13783, 13860, 13937, 14014, 14091, 14168, 14245, 14322, 14399, 14476, 14553, 14630, 14707, 14784, 14861, 14938, 15015, 15092, 15169, 15246, 15323, 15400, 15477, 15554, 15631, 15708, 15785, 15862, 15939, 16016, 16093, 16170, 16247, 16324, 16401, 16478, 16555, 16632, 16709, 16786, 16863, 16940, 17017, 17094, 17171, 17248, 17325, 17402, 17479, 17556, 17633, 17710, 17787, 17864, 17941, 18018, 18095, 18172, 18249, 18326, 18403, 18480, 18557, 18634, 18711, 18788, 18865, 18942, 19019, 19096, 19173, 19250, 19327, 19404, 19481, 19558, 19635 }; static int times150[256] = { 0, 150, 300, 450, 600, 750, 900, 1050, 1200, 1350, 1500, 1650, 1800, 1950, 2100, 2250, 2400, 2550, 2700, 2850, 3000, 3150, 3300, 3450, 3600, 3750, 3900, 4050, 4200, 4350, 4500, 4650, 4800, 4950, 5100, 5250, 5400, 5550, 5700, 5850, 6000, 6150, 6300, 6450, 6600, 6750, 6900, 7050, 7200, 7350, 7500, 7650, 7800, 7950, 8100, 8250, 8400, 8550, 8700, 8850, 9000, 9150, 9300, 9450, 9600, 9750, 9900, 10050, 10200, 10350, 10500, 10650, 10800, 10950, 11100, 11250, 11400, 11550, 11700, 11850, 12000, 12150, 12300, 12450, 12600, 12750, 12900, 13050, 13200, 13350, 13500, 13650, 13800, 13950, 14100, 14250, 14400, 14550, 14700, 14850, 15000, 15150, 15300, 15450, 15600, 15750, 15900, 16050, 16200, 16350, 16500, 16650, 16800, 16950, 17100, 17250, 17400, 17550, 17700, 17850, 18000, 18150, 18300, 18450, 18600, 18750, 18900, 19050, 19200, 19350, 19500, 19650, 19800, 19950, 20100, 20250, 20400, 20550, 20700, 20850, 21000, 21150, 21300, 21450, 21600, 21750, 21900, 22050, 22200, 22350, 22500, 22650, 22800, 22950, 23100, 23250, 23400, 23550, 23700, 23850, 24000, 24150, 24300, 24450, 24600, 24750, 24900, 25050, 25200, 25350, 25500, 25650, 25800, 25950, 26100, 26250, 26400, 26550, 26700, 26850, 27000, 27150, 27300, 27450, 27600, 27750, 27900, 28050, 28200, 28350, 28500, 28650, 28800, 28950, 29100, 29250, 29400, 29550, 29700, 29850, 30000, 30150, 30300, 30450, 30600, 30750, 30900, 31050, 31200, 31350, 31500, 31650, 31800, 31950, 32100, 32250, 32400, 32550, 32700, 32850, 33000, 33150, 33300, 33450, 33600, 33750, 33900, 34050, 34200, 34350, 34500, 34650, 34800, 34950, 35100, 35250, 35400, 35550, 35700, 35850, 36000, 36150, 36300, 36450, 36600, 36750, 36900, 37050, 37200, 37350, 37500, 37650, 37800, 37950, 38100, 38250 }; static int times29[256] = { 0, 29, 58, 87, 116, 145, 174, 203, 232, 261, 290, 319, 348, 377, 406, 435, 464, 493, 522, 551, 580, 609, 638, 667, 696, 725, 754, 783, 812, 841, 870, 899, 928, 957, 986, 1015, 1044, 1073, 1102, 1131, 1160, 1189, 1218, 1247, 1276, 1305, 1334, 1363, 1392, 1421, 1450, 1479, 1508, 1537, 1566, 1595, 1624, 1653, 1682, 1711, 1740, 1769, 1798, 1827, 1856, 1885, 1914, 1943, 1972, 2001, 2030, 2059, 2088, 2117, 2146, 2175, 2204, 2233, 2262, 2291, 2320, 2349, 2378, 2407, 2436, 2465, 2494, 2523, 2552, 2581, 2610, 2639, 2668, 2697, 2726, 2755, 2784, 2813, 2842, 2871, 2900, 2929, 2958, 2987, 3016, 3045, 3074, 3103, 3132, 3161, 3190, 3219, 3248, 3277, 3306, 3335, 3364, 3393, 3422, 3451, 3480, 3509, 3538, 3567, 3596, 3625, 3654, 3683, 3712, 3741, 3770, 3799, 3828, 3857, 3886, 3915, 3944, 3973, 4002, 4031, 4060, 4089, 4118, 4147, 4176, 4205, 4234, 4263, 4292, 4321, 4350, 4379, 4408, 4437, 4466, 4495, 4524, 4553, 4582, 4611, 4640, 4669, 4698, 4727, 4756, 4785, 4814, 4843, 4872, 4901, 4930, 4959, 4988, 5017, 5046, 5075, 5104, 5133, 5162, 5191, 5220, 5249, 5278, 5307, 5336, 5365, 5394, 5423, 5452, 5481, 5510, 5539, 5568, 5597, 5626, 5655, 5684, 5713, 5742, 5771, 5800, 5829, 5858, 5887, 5916, 5945, 5974, 6003, 6032, 6061, 6090, 6119, 6148, 6177, 6206, 6235, 6264, 6293, 6322, 6351, 6380, 6409, 6438, 6467, 6496, 6525, 6554, 6583, 6612, 6641, 6670, 6699, 6728, 6757, 6786, 6815, 6844, 6873, 6902, 6931, 6960, 6989, 7018, 7047, 7076, 7105, 7134, 7163, 7192, 7221, 7250, 7279, 7308, 7337, 7366, 7395 }; #endif TABLE r = g = b = 0; for ( i = 0; i < 5000000; i++ ) { #ifdef FLOAT j = (int) ( 0.299 * r + 0.587 * g + 0.114 * b + 0.5 ); #endif FLOAT #ifdef INT j = ( r * 77 + g * 150 + b * 29 ) >> 8; #endif INT #ifdef TABLE j = ( times77[r] + times150[g] + times29[b] ) >> 8; #endif TABLE r = ( r + 3 ) & 0xff; g = ( g + 5 ) & 0xff; b = ( b + 7 ) & 0xff; } exit( 0 ); }
jef@ace.ee.lbl.gov (Jef Poskanzer) (01/14/89)
In the referenced message, I wrote: } float: 223.0 } int: 35.4 } table: 31.6 Oh yeah, almost forgot the null case - no luminance calculation, just the wrapper program: null: 16.2
raveling@vaxb.isi.edu (Paul Raveling) (01/14/89)
In article <10322@well.UUCP> Jef Poskanzer <jef@rtsg.ee.lbl.gov> writes: >I wrote a quick test program to try out various approximations. It runs >five million conversions. On a Sun 3/260, the timings are: > > float: 223.0 > int: 35.4 > table: 31.6 > >I have appended the program, in case anyone wants to run it on a different >architecture or try different approximations. Just below are some results from an HP 9000/350. I added two runs: One was with "shifty" logic defined by... #ifdef SHIFTY j = ( r+r + (g<<2)+g + b ) >> 3; #endif The other was a "no logic" run, with nothing defined to get an overhead calibration (how much time the loop logic and rgb updating used). The "Less Overhead" column below subtracts this to get a direct comparison of timing for the math only. Test Raw Timing Less Overhead ---- ---------- ------------- float 220.0 201.2 int 37.6 18.8 table 34.9 16.1 shifty 28.9 10.1 overhead 18.8 0 This isn't entirely what I anticipated: The "int" version, (j = ( r * 77 + g * 150 + b * 29 ) >> 8;), appeared to be faster than expected. I checked further and found that on this one the compiler decomposed all three multiplies into shifts, adds, and a subtract. Also, the table version seemed too slow. It turned out that the compiler generated some remarkably crummy code. ALL data except the tables were kept on the stack -- none in registers -- and the subscript address computations appeared to be distinctly suboptimal. Next, maybe tomorrow, I'll try the same stuff with some hand coded assembly language. It should be easy to beat the compiler by LOTS. --------------------- Paul Raveling Raveling@vaxb.isi.edu
falk@sun.uucp (Ed Falk) (01/15/89)
> > BTW, this is an example of a function that couldn't easily > > be accelerated by table lookup unless R, G, and B have very > > few bits. Even then 3D subscript computation puts the table > > lookup in the same speed range as the faster 2 of these > > alternatives. > > Huh? Table look up can be used to replace each of the three multiplies > in aR+bG+cB so that the code becomes something like > > a[R]+b[G]+c[B] > I'm embarrassed. I had been doing (r*77 + g*150 + 29*b)/256 all along thinking that all multiplies took the same amount of time (the compiler, it turns out, optimizes constant multiplies in interesting ways). I've switched all my code to use look-up tables now. I've gained a new respect for look-up tables. -- -ed falk, sun microsystems sun!falk, falk@sun.com card-carrying ACLU member.
raveling@vaxb.isi.edu (Paul Raveling) (01/17/89)
In article <7266@venera.isi.edu> raveling@vaxb.isi.edu (Paul Raveling) writes: > > Next, maybe tomorrow, I'll try the same stuff with some > hand coded assembly language. It should be easy to beat > the compiler by LOTS. > Here's the result of using hand coded assembly language for the "table" algorithm: Test Raw Timing Less Overhead ---- ---------- ------------- C version 34.9 16.1 Assembly version 10.2 6.4 Anyone have a better C compiler? If anyone would like to check this hacked assembly version on other systems, let me know. I can either email it or post it, but should take a few minutes to clean the source up a little & stick in warnings that very few 68K assemblers use identically the same source syntax. --------------------- Paul Raveling Raveling@vaxb.isi.edu
jonathan@jvc.UUCP (Jonathan Hue) (01/18/89)
I'm slightly puzzled by these calculations of luminance from RGB. Doesn't the formula Y = .299R + .587G + .114B only apply when RGB represents intensity, rather than pixel values? If your pixel values are completely gamma-corrected through look-up tables in your frame buffer hardware so they represent intensities, this would work, but if you use linear look-up tables (or don't have any), you would need to convert pixel values to intensity, calculate luminance, then convert them back into pixel values (voltages). Also, considering how far the green of the typical color monitor is from NTSC green, it may be worth deriving new coefficients for the monitor you are using. Jonathan Hue uunet!jvc!jonathan