greg@garnet.berkeley.edu (Greg Kuperberg) (04/03/91)
I would like to see a 1 page or shorter specification of JPEG with an eye towards a conceptual understanding of the underlying data compression technique. I'm sorry to say that I'm too impatient to order the authoritative treatise on JPEG in the mail, and our library doesn't have the book on image compression by Rabbani and Jones. Can someone who knows post or e-mail the information? Thanks in advance for any information you can provide. I should mention that it would be good if there were an archive available on the net which lists concise specifications of the data formats of various compression programs and standards, such as compress(1), PKZIP, Group 3 FAX, JPEG, etc. ---- Greg Kuperberg greg@math.berkeley.edu
eddins@uicbert.eecs.uic.edu (Steve Eddins) (04/03/91)
greg@garnet.berkeley.edu (Greg Kuperberg) writes: >I'm sorry to say that I'm too impatient >to order the authoritative treatise on JPEG in the mail [...] Does such a thing exist yet? If so, can someone in net.land provide ordering information? I have seen postings that mention the JPEG image compression standard, and I have seen postings asking for information about JPEG, but I've seen nothing in the way of concrete information. -- Steve Eddins eddins@uicbert.eecs.uic.edu (312) 996-5771 FAX: (312) 413-0024 University of Illinois at Chicago, EECS Dept., M/C 154, 1120 SEO Bldg, Box 4348, Chicago, IL 60680
tgl@g.gp.cs.cmu.edu (Tom Lane) (04/03/91)
In article <1991Apr2.185934.6675@agate.berkeley.edu>, greg@garnet.berkeley.edu (Greg Kuperberg) writes: > I would like to see a 1 page or shorter specification of JPEG with an > eye towards a conceptual understanding of the underlying data > compression technique. OK, I'll bite... Some background: JPEG works on either full-color or gray-scale images; it does not handle bilevel (black and white) images, at least not efficiently. It doesn't handle colormapped images either; you have to pre-expand those into an unmapped full-color representation. (Nonetheless, JPEG comes out a lot smaller than, say, GIF.) JPEG is a lossy compression method, meaning that it doesn't necessarily reconstruct the exact pixel-for-pixel input image. The technique exploits known properties of the human vision system. With proper parameter choices, it is usually hard to tell a decompressed image from the original *by eye*, but 'diff' will show lots of differences :-). There are a lot of parameters to the JPEG compression process. You can trade off compressed image size against reconstructed image quality over a *very* wide range by adjusting the parameters. You can get image quality ranging from op-art (at 100x smaller than the original image) to quite indistinguishable from the source (at about 3x smaller). My experience is that the threshold of visible difference from the source image is somewhere around 10x to 20x smaller than the original. (Note: the reference image size here is an uncompressed 24-bit-color representation. For comparison, GIF is usually about 3x to 4x smaller than this form. Bear in mind that as a colormapped representation, GIF also loses data compared to full color RGB.) OK, now the outline of the compression algorithm: 1. Transform the image into a suitable color space. This is a no-op for grayscale, but for color images you generally want to transform RGB into a luminance/chrominance color space (YUV, YCbCr, etc). The luminance component is gray scale and the other two axes are color information. The reason for doing this is that you can afford to lose a lot more information in the chrominance components than you can in the luminance component; the human eye is not as sensitive to high-frequency color info as it is to high-frequency luminance. (See any TV system for precedents.) You don't have to change the color space if you don't want to. The remainder of the algorithm works on each color component independently, and doesn't really care what the component is. However, to get optimal results you must choose the compression parameters for a component with an eye to its interpretation. 2. (Optional) Subsample each component by averaging together groups of pixels. A typical choice is to leave the luminance component at full resolution, but to subsample the color components 2:1. (You could not get away with this in an RGB color space...) 3. Group the pixel values for each component into 8x8 blocks. Transform each 8x8 block through a discrete cosine transform (DCT); this is a relative of the Fourier transform and likewise gives a frequency map (with 8x8 components). Thus you now have numbers representing the average value in each block and successively higher-frequency changes within the block. The motivation for doing this is that you can now throw away high-frequency information without affecting low-frequency information. (The DCT transform itself is reversible.) 4. In each block, divide each of the 64 frequency components by a separate "quantization coefficient", and round the results to integers. This is the fundamental information-losing step. A Q.C. of 1 loses no information, larger Q.C.s lose successively more info. The higher frequencies are normally reduced more than the lower. (All 64 Q.C.s are parameters to the compression process; tuning them for best results is a black art. It seems likely that the best values are yet to be discovered.) 5. Encode the reduced coefficients using either Huffman or arithmetic coding. The JPEG spec defines a complex multi-state statistical model for this coding (see my previous post about the difference between modeling and coding). Also, a particular variant of arithmetic coding is specified. 6. Tack on appropriate headers, etc, and output the result. In an "interchange" JPEG file, all of the compression parameters are included in the headers so that the decompressor can reverse the process. For specialized applications, the spec permits the parameters to be omitted from the file; this saves several hundred bytes of overhead, but means that the decompressor must know what parameters the compressor used. The decompression algorithm reverses this process, and typically adds some smoothing steps to reduce pixel-to-pixel discontinuities. As far as I know, there is no detailed specification of JPEG available on the net. For more information try: Gregory Wallace, "Overview of the JPEG (ISO/CCITT) Still Image Compression Standard", Proceedings of the SPIE Symposium on Electronic Imaging Science and Technology, Santa Clara, CA, Feb. 1990. In SPIE Vol 1244, Image Processing Algorithms and Techniques (1990). ``Digital Image Compression Techniques'' by M. Rabbani and P. W. Jones (Tutorial Text Series, Vol. TT7, SPIE Press, Bellingham, WA, 1991). I have not seen either of these... -- tom lane Internet: tgl@cs.cmu.edu BITNET: tgl%cs.cmu.edu@cmuccvma
leavitt@mordor.hw.stratus.com (Will Leavitt) (04/04/91)
(reprinted from a C-Cube Microsystems information packet) JPEG ALGORITHM OVERVIEW Background of the JPEG Algorithm The obvious advantages of image compression led to the formation of an international standards group: the Joint Photographic Experts Group (JPEG). JPEG's objective has been to create a common CCITT/ISO standard for continuous tone image compresion for both color and greyscale images. With the proposed JPEG standard, a continuous range of compression can be acheived. Twenty-four bits per pixel can be reduced to one, or less, while image quality is maintained. The emerging standard defined by JPEG is designed to cover a broad range of applications. The scope and variety of target products resulted in a three-part JPEG algorithm definition: a baseline system, an extended system, and a special function, used to provide support for lossless compression. The algorithm for the baseline system is a lossy technique, based on the discrete cosine transform followed by variable word length encoding. The baseline system is the mandatory part of the system, and every JPEG codec must incorporate this function. The extended system adds features such as more sophisticated coding, lossless transmission, and progressive buildup. The compression acheived by lossy methods is much higher than with lossless techniques. Image compression factors depend on the image material and, with lossy coding, can be selected by the user in a continuous manner. For CCIR 601 type images, the JPEG algorithm typically achieves compression ratious of 24:1, yielding picture quality nearly indistinguishable form the original. Deeper compression ratios can be acheived while maintaining good image quality. Operation of the JPEG Algorithm The operation of the JPEG algorithm can be divided into three basic stages: 1) the removal of the data redundancy by means of the Discrete Cosine Transform (DCT). 2) The quantization of the DCT coefficient using weighting functions optimized for the human visual system. 3) The encoding of the data to minimize the entropy of the quantized DCT coefficients. The entropy encoding is done with a Huffman variable-word-length encoder. Although color conversion is a part of the redundancy removal process, it is not part of the JPEG algorithm. It is the goal of JPEG to be independant of the color space. JPEG handles colors as separate components. Therefore, it can be used to compress data from different color spaces, such as RGB, YUV and CMYK. However, the best compression results are achieved if the color components are independant (noncorrelated), such as YUV, where most of the information is concentrated in the luminance and less in the chrominance. RGB color components can be converted via a linear transformation into YUV compoents as shown: | Y | | 0.299 0.587 0.114 | | R | | U | = |-0.169 -0.332 0.500 | * | G | | V | | 0.500 -0.418 -0.082 | | B | Another advantage of using the YUV color space comes from reducing the spatial resolution of the U and V chrominance components. Because chrominance does not need to be specified as frequently as luminence, every other U element and every other V element can be discarded. As a consequence, a data reduction of 3 to 2 is obtained by transforming RGB into YUV 4:2:2. The conversion in color space is a first step toward compressing the image. Discrete Cosine Transform For each separate color compoent, the image is broken into 8x8 blocks that cover the entire image. These blocks form the input to the DCT. In the 8x8 blocks, typically the pixel values vary slowly. Therefore, the energy is of low spatial frequency. A transform that can be used to concentrate the energy into a few coefficients is the two-dimensional 8x8 DCT. The transform, studied extensively for image compression, is extremely efficient for highly correlated data. Conceptually, a one dimensional DCT can be thought of as taking the Fourier Transform and retaining only the real (the cosine) part. The two dimensional DCT can be obatined by performing a one-dimensional DCT on the columns and then a one-dimensional DCT on the rows. The transformed ouptut from the two-dimensional DCT is ordered such that the mean value, the DC coeeficient, is in the upper left corner of the 8x8 coefficient block and the higher frequency coefficients progress by distance from the DC coefficient. Higher vertical frequencies are represented by higher row numbers, and higher horozontal frequencies are represented by higher column numbers. Quantization The next step is the quantization of the frequency coefficients. The coefficients are quantized to reduce their magnitude and increase the number of zero-value coefficients. A uniform quantizer was selected for the JPEG baseline method. The step size is varied according to the coefficient location and tuned for each color component. The coding model rearrainges the quantized frequency coefficients into a zigzag pattern, with the lowest frequencioes first and the higher frequencies last. The zigzag pattern is used to increase the run-length of zero coefficients found in the block. The assumption is that the lower frequencies tend to have large coefficients and the high frequencies are, by the nature of most pictures, predominantly zero. The first coefficient (0,0) is called the DC coefficient, and the remaing coefficients are the AC coefficients. The AC coefficients are traversed by the zigzag pattern from the (0,1) location to the (7,7) location. The DC coefficients of subsequent blocks often vary only lsightly. Therefore, differences between successive DC coefficients are small. The coding of the DC coefficient exploints this property through Differential Pulse Code Modulation (DPCM). This technique codes the difference (Delta) between the quantized DC coefficient of the current block and the DC coefficient of the previous block. The formula for the encoding of the DC code is: Delta = DC(0,0) - DC(0,0) k k k-1 The inverse calculation takes place at the decoder. Zero Run-Length Coding The quantized AC coefficients usually contain runs of consecutive zeros. Therefore, a coding advantage can be obtained by using a run-length technique, where the upper four bits of the code symbol indicate the number of consecutive zeros before the next coefficient and the lower four bits indicate the number of significant bits in the next coefficient. Following the code symbol are the significant bits of the coefficient, the length of which can be determined by the lower four bits of the code. The inverse run-length coder translates the input coded stream into an output array of AC coefficients. It takes curent code and appends the number of zeros to the output array corresponding to the four bits used for the run-length code. The coefficient placed in the output array has the number of bits determined by the lower four bits of the run-length code and a value determined by the number of trailing bits. Entropy Encoding The block codes from the DPCM and run-length models can be further compressed using entropy encoding. For the baseline JPEG method, the Huffman coder is used to reduce entropy. One reason for using the Huffman coder is that it is easy to implement by means of a lookup table in hardware. To compress data symbols, the Huffman coder creates shorter codes for frequently ocurring symbols and longer codes for ocassionally occuring symbols. Many applications may use pre-defined Huffman tables. Therefore, the baseline encoder can operate as a one pass or two-pass system. In a one-pass system, predetermined Huffman tables are used, whereas in the two-pass system, Huffman tables are created that are specific to the image to be encoded. The first step in creating the Huffman codes is to create a table assigning a frequency count to each symbol. Symbols with a higher probability are assigned shorted codes than the less-frequently occuring symbols. Summary of Baseline JPEG The baseline system provides efficient lossy sequential image compression. It supports four color components simultaneously, with a maximum number of eight input bits for each color pixel component. The basic data entitiy is a block of 8x8 pixels. However, this bock can represent a large subsampled image area (for example, sub-sampled by decimated chrominance signals). The blocks of the different color components are sent interleaved, thereby allowing the decoder to create the decompressed image and translate back to the original color space on-the-fly. -- ............................................................................... this content-free posting has been brought to you by: Will Leavitt 508-490-6231 leavitt@mordor.hw.stratus.com