[comp.compression] Request for JPEG specification.

greg@garnet.berkeley.edu (Greg Kuperberg) (04/03/91)

I would like to see a 1 page or shorter specification of JPEG with an
eye towards a conceptual understanding of the underlying data
compression technique.  I'm sorry to say that I'm too impatient
to order the authoritative treatise on JPEG in the mail, and our
library doesn't have the book on image compression by Rabbani and Jones.

Can someone who knows post or e-mail the information?  Thanks in
advance for any information you can provide.

I should mention that it would be good if there were an archive
available on the net which lists concise specifications of the data
formats of various compression programs and standards, such as
compress(1), PKZIP, Group 3 FAX, JPEG, etc.
----
Greg Kuperberg
greg@math.berkeley.edu

eddins@uicbert.eecs.uic.edu (Steve Eddins) (04/03/91)

greg@garnet.berkeley.edu (Greg Kuperberg) writes:

>I'm sorry to say that I'm too impatient
>to order the authoritative treatise on JPEG in the mail [...]

Does such a thing exist yet?  If so, can someone in net.land provide
ordering information?

I have seen postings that mention the JPEG image compression standard,
and I have seen postings asking for information about JPEG, but I've
seen nothing in the way of concrete information.
-- 
Steve Eddins	
eddins@uicbert.eecs.uic.edu 	(312) 996-5771 		FAX: (312) 413-0024
University of Illinois at Chicago, EECS Dept., M/C 154, 1120 SEO Bldg,
Box 4348, Chicago, IL  60680

tgl@g.gp.cs.cmu.edu (Tom Lane) (04/03/91)

In article <1991Apr2.185934.6675@agate.berkeley.edu>, greg@garnet.berkeley.edu (Greg Kuperberg) writes:
> I would like to see a 1 page or shorter specification of JPEG with an
> eye towards a conceptual understanding of the underlying data
> compression technique.

OK, I'll bite...

Some background:

JPEG works on either full-color or gray-scale images; it does not handle
bilevel (black and white) images, at least not efficiently.  It doesn't
handle colormapped images either; you have to pre-expand those into an
unmapped full-color representation.  (Nonetheless, JPEG comes out a lot
smaller than, say, GIF.)

JPEG is a lossy compression method, meaning that it doesn't necessarily
reconstruct the exact pixel-for-pixel input image.  The technique exploits
known properties of the human vision system.  With proper parameter choices,
it is usually hard to tell a decompressed image from the original *by eye*,
but 'diff' will show lots of differences :-).

There are a lot of parameters to the JPEG compression process.  You can
trade off compressed image size against reconstructed image quality over a
*very* wide range by adjusting the parameters.  You can get image quality
ranging from op-art (at 100x smaller than the original image) to quite
indistinguishable from the source (at about 3x smaller).  My experience is
that the threshold of visible difference from the source image is somewhere
around 10x to 20x smaller than the original.  (Note: the reference image
size here is an uncompressed 24-bit-color representation.  For comparison,
GIF is usually about 3x to 4x smaller than this form.  Bear in mind that as
a colormapped representation, GIF also loses data compared to full color RGB.)


OK, now the outline of the compression algorithm:

1. Transform the image into a suitable color space.  This is a no-op for
grayscale, but for color images you generally want to transform RGB into
a luminance/chrominance color space (YUV, YCbCr, etc).  The luminance
component is gray scale and the other two axes are color information.
The reason for doing this is that you can afford to lose a lot more
information in the chrominance components than you can in the luminance
component; the human eye is not as sensitive to high-frequency color info
as it is to high-frequency luminance.  (See any TV system for precedents.)
You don't have to change the color space if you don't want to.  The
remainder of the algorithm works on each color component independently,
and doesn't really care what the component is.  However, to get
optimal results you must choose the compression parameters for a
component with an eye to its interpretation.

2. (Optional) Subsample each component by averaging together groups of
pixels.  A typical choice is to leave the luminance component at full
resolution, but to subsample the color components 2:1.  (You could not get
away with this in an RGB color space...)

3. Group the pixel values for each component into 8x8 blocks.  Transform
each 8x8 block through a discrete cosine transform (DCT); this is a relative
of the Fourier transform and likewise gives a frequency map (with 8x8
components).  Thus you now have numbers representing the average value in
each block and successively higher-frequency changes within the block.  The
motivation for doing this is that you can now throw away high-frequency
information without affecting low-frequency information.  (The DCT transform
itself is reversible.)

4. In each block, divide each of the 64 frequency components by a separate
"quantization coefficient", and round the results to integers.  This is the
fundamental information-losing step.  A Q.C. of 1 loses no information,
larger Q.C.s lose successively more info.  The higher frequencies are
normally reduced more than the lower.  (All 64 Q.C.s are parameters to the
compression process; tuning them for best results is a black art.  It seems
likely that the best values are yet to be discovered.)

5. Encode the reduced coefficients using either Huffman or arithmetic
coding.  The JPEG spec defines a complex multi-state statistical model for
this coding (see my previous post about the difference between modeling and
coding).  Also, a particular variant of arithmetic coding is specified.

6. Tack on appropriate headers, etc, and output the result.  In an
"interchange" JPEG file, all of the compression parameters are included
in the headers so that the decompressor can reverse the process.  For
specialized applications, the spec permits the parameters to be omitted
from the file; this saves several hundred bytes of overhead, but means that
the decompressor must know what parameters the compressor used.


The decompression algorithm reverses this process, and typically adds some
smoothing steps to reduce pixel-to-pixel discontinuities.


As far as I know, there is no detailed specification of JPEG available
on the net.  For more information try:

Gregory Wallace, "Overview of the JPEG (ISO/CCITT) Still Image
Compression Standard", Proceedings of the SPIE Symposium on
Electronic Imaging Science and Technology, Santa Clara, CA, Feb. 1990.
In SPIE Vol 1244, Image Processing Algorithms and Techniques (1990).

``Digital Image Compression Techniques'' by M. Rabbani and P. W. Jones
(Tutorial Text Series, Vol. TT7, SPIE Press, Bellingham, WA, 1991).

I have not seen either of these...

-- 
			tom lane
Internet: tgl@cs.cmu.edu	BITNET: tgl%cs.cmu.edu@cmuccvma

leavitt@mordor.hw.stratus.com (Will Leavitt) (04/04/91)

(reprinted from a C-Cube Microsystems information packet)


JPEG ALGORITHM OVERVIEW

Background of the JPEG Algorithm

The obvious advantages of image compression led to the formation of an
international standards group: the Joint Photographic Experts Group
(JPEG).  JPEG's objective has been to create a common CCITT/ISO
standard for continuous tone image compresion for both color and
greyscale images.  With the proposed JPEG standard, a continuous range
of compression can be acheived.  Twenty-four bits per pixel can be
reduced to one, or less, while image quality is maintained.

The emerging standard defined by JPEG is designed to cover a broad
range of applications. The scope and variety of target products
resulted in a three-part JPEG algorithm definition: a baseline system,
an extended system, and a special function, used to provide support
for lossless compression.

The algorithm for the baseline system is a lossy technique, based on
the discrete cosine transform followed by variable word length
encoding.  The baseline system is the mandatory part of the system,
and every JPEG codec must incorporate this function.

The extended system adds features such as more sophisticated coding,
lossless transmission, and progressive buildup.

The compression acheived by lossy methods is much higher than with
lossless techniques.  Image compression factors depend on the image
material and, with lossy coding, can be selected by the user in a
continuous manner.  For CCIR 601 type images, the JPEG algorithm
typically achieves compression ratious of 24:1, yielding picture
quality nearly indistinguishable form the original.  Deeper
compression ratios can be acheived while maintaining good image
quality.


Operation of the JPEG Algorithm

The operation of the JPEG algorithm can be divided into three basic stages:

  1) the removal of the data redundancy by means of the Discrete Cosine
     Transform (DCT).

  2) The quantization of the DCT coefficient using weighting functions
     optimized for the human visual system.

   3) The encoding of the data to minimize the entropy of the quantized
      DCT coefficients.  The entropy encoding is done with a Huffman
      variable-word-length encoder.

Although color conversion is a part of the redundancy removal process,
it is not part of the JPEG algorithm.  It is the goal of JPEG to be
independant of the color space.  JPEG handles colors as separate
components.  Therefore, it can be used to compress data from different
color spaces, such as RGB, YUV and CMYK.

However, the best compression results are achieved if the color
components are independant (noncorrelated), such as YUV, where most of
the information is concentrated in the luminance and less in the
chrominance.  RGB color components can be converted via a linear
transformation into YUV compoents as shown:

   | Y |       | 0.299  0.587  0.114 |   | R |
   | U |   =   |-0.169 -0.332  0.500 | * | G |
   | V |       | 0.500 -0.418 -0.082 |   | B |

Another advantage of using the YUV color space comes from reducing the
spatial resolution of the U and V chrominance components.  Because
chrominance does not need to be specified as frequently as luminence,
every other U element and every other V element can be discarded.  As
a consequence, a data reduction of 3 to 2 is obtained by transforming
RGB into YUV 4:2:2.  The conversion in color space is a first step
toward compressing the image.


Discrete Cosine Transform

For each separate color compoent, the image is broken into 8x8 blocks
that cover the entire image.  These blocks form the input to the DCT.

In the 8x8 blocks, typically the pixel values vary slowly.  Therefore,
the energy is of low spatial frequency.  A transform that can be used
to concentrate the energy into a few coefficients is the
two-dimensional 8x8 DCT.  The transform, studied extensively for image
compression, is extremely efficient for highly correlated data.

Conceptually, a one dimensional DCT can be thought of as taking the
Fourier Transform and retaining only the real (the cosine) part.  The
two dimensional DCT can be obatined by performing a one-dimensional
DCT on the columns and then a one-dimensional DCT on the rows.  The
transformed ouptut from the two-dimensional DCT is ordered such that
the mean value, the DC coeeficient, is in the upper left corner of the
8x8 coefficient block and the higher frequency coefficients progress
by distance from the DC coefficient.  Higher vertical frequencies are
represented by higher row numbers, and higher horozontal frequencies
are represented by higher column numbers.


Quantization

The next step is the quantization of the frequency coefficients.  The
coefficients are quantized to reduce their magnitude and increase the
number of zero-value coefficients.  A uniform quantizer was selected
for the JPEG baseline method.  The step size is varied according to
the coefficient location and tuned for each color component.

The coding model rearrainges the quantized frequency coefficients into
a zigzag pattern, with the lowest frequencioes first and the higher
frequencies last.  The zigzag pattern is used to increase the
run-length of zero coefficients found in the block.  The assumption is
that the lower frequencies tend to have large coefficients and the
high frequencies are, by the nature of most pictures, predominantly
zero.  The first coefficient (0,0) is called the DC coefficient, and
the remaing coefficients are the AC coefficients.  The AC coefficients
are traversed by the zigzag pattern from the (0,1) location to the
(7,7) location.

The DC coefficients of subsequent blocks often vary only lsightly.
Therefore, differences between successive DC coefficients are small.
The coding of the DC coefficient exploints this property through
Differential Pulse Code Modulation (DPCM).  This technique codes the
difference (Delta) between the quantized DC coefficient of the current
block and the DC coefficient of the previous block.  The formula for
the encoding of the DC code is:

   Delta  = DC(0,0)  -  DC(0,0)
        k          k           k-1

The inverse calculation takes place at the decoder.


Zero Run-Length Coding

The quantized AC coefficients usually contain runs of consecutive
zeros.  Therefore, a coding advantage can be obtained by using a
run-length technique, where the upper four bits of the code symbol
indicate the number of consecutive zeros before the next coefficient
and the lower four bits indicate the number of significant bits in the
next coefficient.

Following the code symbol are the significant bits of the coefficient,
the length of which can be determined by the lower four bits of the
code.  The inverse run-length coder translates the input coded stream
into an output array of AC coefficients.  It takes curent code and
appends the number of zeros to the output array corresponding to the
four bits used for the run-length code.  The coefficient placed in the
output array has the number of bits determined by the lower four bits
of the run-length code and a value determined by the number of
trailing bits.


Entropy Encoding

The block codes from the DPCM and run-length models can be further
compressed using entropy encoding.  For the baseline JPEG method, the
Huffman coder is used to reduce entropy.  One reason for using the
Huffman coder is that it is easy to implement by means of a lookup
table in hardware.  To compress data symbols, the Huffman coder
creates shorter codes for frequently ocurring symbols and longer codes
for ocassionally occuring symbols.  Many applications may use
pre-defined Huffman tables.  Therefore, the baseline encoder can
operate as a one pass or two-pass system.  In a one-pass system,
predetermined Huffman tables are used, whereas in the two-pass system,
Huffman tables are created that are specific to the image to be
encoded.

The first step in creating the Huffman codes is to create a table
assigning a frequency count to each symbol.  Symbols with a higher
probability are assigned shorted codes than the less-frequently
occuring symbols.


Summary of Baseline JPEG

The baseline system provides efficient lossy sequential image
compression.  It supports four color components simultaneously, with
a maximum number of eight input bits for each color pixel component.

The basic data entitiy is a block of 8x8 pixels.  However, this bock
can represent a large subsampled image area (for example, sub-sampled
by decimated chrominance signals).  The blocks of the different color
components are sent interleaved, thereby allowing the decoder to
create the decompressed image and translate back to the original color
space on-the-fly.

--
...............................................................................
this content-free posting has been brought to you by:
                                           Will Leavitt     508-490-6231
                                           leavitt@mordor.hw.stratus.com