[comp.graphics] Color quantization: flesh tones

mlm@nl.cs.cmu.edu (Michael L. Mauldin) (09/07/89)

Has anyone experimented with psychological color preferences in
quantizing using Heckbert's median cut?  Here's an example problem:

    In some images containing people's faces, where the face is
    only a small part of the image, very few colors are assigned
    to "flesh" color.  The result is banding/loss of resolution in
    an area of the image that is interesting to the viewer out of
    proportion to its relative size.  The problem is most severe
    when quantizing to 32 or fewer colors.

I tried the following experiment, with mixed results.  Choose a color
that is "flesh" (I used <192,96,80>), and after the image has been
histogrammed, but before the median cut color assignment is done,
multiply each cell by a "bonus" between 1 and 2 if it is within some
minimum distance from this point.  On an image of "Our God of Free
Software, RMS", where the face filled about 8% of the screen, using 32
colors, the number of "flesh" colors was increased from 3 to 6, and
significant detail was added to the facial region.  On another image,
a baby picture with significantly "whiter" skin, the method didn't
affect the image much, and when quantizing the RMS image with 16
colors, the whole image tended to look like a sepia tone print, rather
than a color image.

I can think of several modifications:

	1. Get a better definition of "flesh" (racially unbiased :-)
	2. Tweak the bonus function
	3. [Actually used in some Amiga software] extract a subimage
	   containing mostly the feature(s) of interest, build the
	   colormap using statistics form this region.

Anybody else have any good ideas?  Has anyone else experimented with
this?  Is there a reference I don't know about?

Michael L. Mauldin (Fuzzy)		School of Computer Science
ARPA: Michael.Mauldin@NL.CS.CMU.EDU	Carnegie Mellon University
Phone: (412) 268-3065			Pittsburgh, PA  15213-3890

spencer@eecs.umich.edu (Spencer W. Thomas) (09/08/89)

An interesting idea...

> multiply each cell by a "bonus" between 1 and 2 if it is within some
> minimum distance from [flesh color].

Another way to make the color reproduction look better is to dither.
But this was pointed out in PH's original paper.

--
=Spencer (spencer@eecs.umich.edu)

pepke@loligo (Eric Pepke) (09/09/89)

In article <6087@pt.cs.cmu.edu> mlm@nl.cs.cmu.edu (Michael L. Mauldin) writes:
>I can think of several modifications:
>
>	1. Get a better definition of "flesh" (racially unbiased :-)
>	2. Tweak the bonus function
>	3. [Actually used in some Amiga software] extract a subimage
>	   containing mostly the feature(s) of interest, build the
>	   colormap using statistics form this region.
>
>Anybody else have any good ideas?  Has anyone else experimented with
>this?  Is there a reference I don't know about?

I don't know whether the Amiga software to which you refer does this, but
one of the things I have not gotten around to trying is a detail brush that
you rub on areas you want to be more detailed.  This could also be a lasso,
of course.  This would enable people to make decisions like "I need a little 
bit more detail over here" or "I need a lot more detail over here."  The
effect on the histogram could be cumulative.

Eric Pepke                                     INTERNET: pepke@gw.scri.fsu.edu
Supercomputer Computations Research Institute  MFENET:   pepke@fsu
Florida State University                       SPAN:     scri::pepke
Tallahassee, FL 32306-4052                     BITNET:   pepke@fsu

Disclaimer: My employers seldom even LISTEN to my opinions.
Meta-disclaimer: Any society that needs disclaimers has too many lawyers.

falk@sun.Eng.Sun.COM (Ed Falk) (09/15/89)

> Has anyone experimented with psychological color preferences in
> quantizing using Heckbert's median cut?  Here's an example problem:
> 
>     In some images containing people's faces, where the face is
>     only a small part of the image, very few colors are assigned
>     to "flesh" color.  The result is banding/loss of resolution in
>     an area of the image that is interesting to the viewer out of
>     proportion to its relative size.  The problem is most severe
>     when quantizing to 32 or fewer colors.
> 

Here's a thought; try converting RGB to the NTSC IYQ coordinates
and quantize in IYQ space.  I suggest this because NTSC chose
the Y axis to be biased towards flesh tones and TV pictures transmit
more power along that axis than along the Q axis (I is intensity).

I'm sorry, but I don't have the transformation matrix from RGB to IYQ handy.

-- 
		-ed falk, sun microsystems, sun!falk, falk@sun.com

  "If you wrapped yourself in the flag like George Bush does, you'd
  be worried about flag-burning too"

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (09/19/89)

In article <124742@sun.Eng.Sun.COM> falk@sun.Eng.Sun.COM (Ed Falk) writes:
|
|Here's a thought; try converting RGB to the NTSC IYQ coordinates
|and quantize in IYQ space.  I suggest this because NTSC chose
|the Y axis to be biased towards flesh tones and TV pictures transmit
|more power along that axis than along the Q axis (I is intensity).
|
|I'm sorry, but I don't have the transformation matrix from RGB to IYQ handy.
|

I thought Y was the intensity (luminance) component... and the most
bandwidth is used for the luminance, with less for the color components.
Here are the integer [0..255] pixel value formulae that I use:

	Y = (((77 * R) + (150 * G) + (29 * B)) / 256);
	I = (((153 * R) + (-70 * G) + (-82 * B)) / 256);
	Q = (((54 * R) + (-134 * G) + (80 * B)) / 256);

	R = (((256 * Y) + (245 * I) + (159 * Q)) / 256);
	G = (((256 * Y) + (-70 * I) + (-167 * Q)) / 256);
	B = (((256 * Y) + (-283 * I) + (436 * Q)) / 256);

The above forms the heart of a utility I wrote to set the luminance (Y)
of a color image from a monochrome image.  I use this in convolutions,
particularly for edge sharpening, such that I do the convolution only
on the luminance component, then recombine the output with the original
color image.

jlg@hpfcdq.HP.COM (Jeff Gerckens) (09/19/89)

> > Has anyone experimented with psychological color preferences in
> > quantizing using Heckbert's median cut?  Here's an example problem:
> > 
> >     In some images containing people's faces, where the face is
> >     only a small part of the image, very few colors are assigned
> >     to "flesh" color.  The result is banding/loss of resolution in
> >     an area of the image that is interesting to the viewer out of
> >     proportion to its relative size.  The problem is most severe
> >     when quantizing to 32 or fewer colors.
> > 
> 
> Here's a thought; try converting RGB to the NTSC IYQ coordinates
> and quantize in IYQ space.  I suggest this because NTSC chose
> the Y axis to be biased towards flesh tones and TV pictures transmit
> more power along that axis than along the Q axis (I is intensity).
> 
> I'm sorry, but I don't have the transformation matrix from RGB to IYQ handy.
> 
> -- 
> 		-ed falk, sun microsystems, sun!falk, falk@sun.com

Almost....

The Y axis in the NTSC encoding (IYQ) is the intensity, which was selected to
match the CIE-1931 XYZ intensity for the NTSC standard phosphors.  Both the I
axis and the Q axis are named after the signal encoding technique used, being
In-phase and Quadrature, respectively. The I and Q axis carry the chromaticity
information, and are selected for encoding the most information on the I and
less on the Q.

Use of this space will only affect your results if you weight the different 
axis differently, since the linear transform to/from YIQ and RGB yields the 
same results for any linear interpolation regardless of which space the 
interpolation takes place in.

- Jeff Gerckens, Graphics Technology Division, Hewlett-Packard Company.
  ...!hplabs!hpfcla!jlg
  "What color is a white horse in a dark room?"

hutch@fps.com (Jim Hutchison) (09/19/89)

In <1212@midgard.Midgard.MN.ORG> dal@midgard.Midgard.MN.ORG (Dale Schumacher):
[... In NTSC ...]
>I thought Y was the intensity (luminance) component... and the most
>bandwidth is used for the luminance, with less for the color components.
>Here are the integer [0..255] pixel value formulae that I use:

>	Y = (((77 * R) + (150 * G) + (29 * B)) / 256);
>	I = (((153 * R) + (-70 * G) + (-82 * B)) / 256);
>	Q = (((54 * R) + (-134 * G) + (80 * B)) / 256);

>	R = (((256 * Y) + (245 * I) + (159 * Q)) / 256);
>	G = (((256 * Y) + (-70 * I) + (-167 * Q)) / 256);
>	B = (((256 * Y) + (-283 * I) + (436 * Q)) / 256);

>The above forms the heart of a utility I wrote to set the luminance (Y)
>of a color image from a monochrome image.  I use this in convolutions,
>particularly for edge sharpening, such that I do the convolution only
>on the luminance component, then recombine the output with the original
>color image.

It would seem that by using these equations, you might end up with a fair
amount of error in the process of remapping the colors.  Atleast you will
want to round-up in order to halve the error.  E.g.

	Y = ((77 * R) + (150 * G) + (29 * B) + 128) / 256;

Or, you could use the scaled numbers for the convolution and only rescale
them when you reconvert to RGB.  This might make your convolution algorithm
to messy, perhaps you might want to just save the error from the original
RGB->YIQ conversion and add that back into the RGB at the end.  Have you
noticed the error in your output?  Is it significant enough to cause
darkening of image or loss of shadow detail?

/*    Jim Hutchison   		{dcdwest,ucbvax}!ucsd!celerity!hutch  */
/*    Disclaimer:  I am not an official spokesman for FPS computing   */

falk@sun.Eng.Sun.COM (Ed Falk) (09/22/89)

In article <390037@hpfcdq.HP.COM>, jlg@hpfcdq.HP.COM (Jeff Gerckens) writes:
> > > Has anyone experimented with psychological color preferences in
> > > quantizing using Heckbert's median cut?  Here's an example problem:
> > > 
> > >     In some images containing people's faces, where the face is
> > >     only a small part of the image, very few colors are assigned
> > >     to "flesh" color.  The result is banding/loss of resolution in
> > >     an area of the image that is interesting to the viewer out of
> > >     proportion to its relative size.  The problem is most severe
> > >     when quantizing to 32 or fewer colors.
> > > 
> > 
> > Here's a thought; try converting RGB to the NTSC IYQ coordinates
> > and quantize in IYQ space.  I suggest this because NTSC chose
> > the Y axis to be biased towards flesh tones and TV pictures transmit
> > more power along that axis than along the Q axis (I is intensity).
> > 
> > I'm sorry, but I don't have the transformation matrix from RGB to IYQ handy.
> > 
> > -- 
> > 		-ed falk, sun microsystems, sun!falk, falk@sun.com
> 
> Almost....
> 
> The Y axis in the NTSC encoding (IYQ) is the intensity, which was selected to
> match the CIE-1931 XYZ intensity for the NTSC standard phosphors.  Both the I
> axis and the Q axis are named after the signal encoding technique used, being
> In-phase and Quadrature, respectively. The I and Q axis carry the chromaticity
> information, and are selected for encoding the most information on the I and
> less on the Q.
> 
> Use of this space will only affect your results if you weight the different 
> axis differently, since the linear transform to/from YIQ and RGB yields the 
> same results for any linear interpolation regardless of which space the 
> interpolation takes place in.

All true.  Mea Culpa.  Here are the equations from the FCC regs:

	Y = .30R + .59G + .11B
	I = -.27(B-Y) + .74(R-Y)
	Q =  .41(B-Y) + .48(R-Y)

In matrix form, this is:

	|Y|   | .300  .590  .110 | |R|
	|I| = | .599 -.277 -.322 | |G|
	|Q|   | .213 -.525  .312 | |B|

	|R|   | 1.   .947  .624 | |Y|
	|G| = | 1.  -.275 -.636 | |I|
	|B|   | 1. -1.108 1.709 | |Q|

The I axis was chosen to be the fleshtone axis, and about three times as
much power is transmitted on this axis as on the Q axis.  This way, a weak
signal will not degrade flesh tones as much as other colors.

-- 
		-ed falk, sun microsystems, sun!falk, falk@sun.com

  "If you wrapped yourself in the flag like George Bush does, you'd
  be worried about flag-burning too"