pierce@radius.com (Pierce T. Wetter III) (11/15/90)
Where to I get JPEG info? To get JPEG spec send $8.00 to: Attn: JPEG Draft Technical Specification X3 Secretartiat: Computer and Business Equipment Manufactureers Association 311 First Street NW, Suite 500 Washington, DC 20001-2178 Say somewhere in your letter that you want the document for review and comment. What does JPEG mean? Joint Photographic Experts Group What is the JPEG compression standard? The JPEG compression standard is a method of compressing images at very high compression ratios. Because you can't get something for nothing, the JPEG compression method lose information. How does it work? Background: According to information theory, the entropy of a message is a function of the symbols that make up that message, and the relative probablity of each symbol in that message occuring. Imagine a message made up of 52 different symbols corresponding to the alphabet in both upper and lower case. Suppose we want to send a message to someone. The first thing we do is find the probability of each letter occuring. As "Wheel of Fortune" fans know, not all letters are equally likely. The vowels, and certain consonants are much more likely. Capital letters are less likely still, but an upper case 'S' is more common than a lower case 'z'. If we count all the letters in the message, and put them in an array named 'counts' we can use the following C code to compute the entropy: double BigH=0; long N= Number of symbols in message; for(i='a';i<='z';i++) H -= counts[i]/N * log2(counts[i]/N); That is, the entropy equals the sum of the negative logorithm (base 2) of the probability of a symbol times the probability of that symbol. H is the entropy and equals the number of minimum # of bits/symbol needed that message. The efficiency of a coding method is equal to the entropy over the average number of bits used to transmit that message. Lets get back to our example message. The brute force method of transmission would require 6 bits/symbol. Each bit pattern would have its very own bit pattern, with 12 left over. This would be wasteful, even if we had to transmit 64 different symbols. Since lower case is more common that upper case, it would better to use 5 bits. 26 of the 32 entries determine which lower case letter to use, the remaining 6 are devided equally among the upper case letters. Immediatly following one of these 6 patterns is a 3 bit code indicating which Upper case letter is being used. The transmisson of a lowercase letter would require 5 bits the transmission of an uppercase, 8 bits. Since lower case letters are many times more common than upper case letters, the single bit that we save EACH TIME we send a lower case letter more than makes up for the fact that we have to send 8 bits for an upper case letter. As a further optimization, we could decide to send only one bit for the letter 'e' and 6 bits for all the other letters. (One bit to indicate 'NOT e' and 5 bits to indicate which of the other symbols is correct.) That way, we will save 5 bits every time we use an e, and waste a bit every time we send a different letter. If e occurs more than 5 times as often as every other letter in our message, we will save bits. The end result of this is huffman coding. Each character is counted, and a variable number of bits is assigned to its representation. Huffman, Lempel-Ziv etc., all these compression methods work more or less like this. Gee, someone says, what if I want to send a message like this 'aeaeaeaeaeaeae'. That would compress to one bit a letter, '01010101010101' and is 100% efficient. but if I take the message two letters at a time I could send that as '1111111' (1='ae') so I can get 200% efficiency. Nope. If you look at the image two letters at a time, you are changing your symbols. You now have (52x52=2700) different symbols. In the message above, you only have one possible symbol, 'ae', so the entropy of the message is zero, so you can never be 100% efficient. If you look at your message in different sizes, the entropies will be different and you may get better compression. For example, one character at a time, the message 'abaeabaeab' will compress to '101100101100101' (1='a', 01='b', 10='e) The entropy of that message is ( -0.5*log2(.5)-0.3*log2(.3)-0.2*log2(0.2)) = 1.485 bits/message. The efficiency of the encoded message is E= 1.485/1.5 is 99%. On the other hand, at two characters/symbol the entropy drops to 1 bit a symbol, and 100% efficiency is possible with only 5 bits for the entire message. That is the difference between 8-bit, 16 bit, and 32 bit encoding, the symbols. Note that the above efficiency assumes that the translation table between the encoded and decoded form is sent seperately or built into the decoding method, otherwise, it will have to be sent along with the encoded message, which will reduce the actual efficiency. Lossy compression methods: The entropy of a message tells us the absolute best we can do for a message. To do any better than that, we will have to throw away information. For images we can rely on certain psycho-physical aspects of the way our eyes see things. The JPEG process is two-fold. The first pass is to break up the image into planes. Generally, it is a good idea to seperate out the brightness information from the color information by converting the image to Yuv or Lab but this is not required. The next step is to break that plane info 8x8 or 16x16 chunks. These chunks are converted from position and value information into frequency information. For example an alternating black and white pattern would be recorded as a 50% cycle rather than as 10101010101... . The overall color of the chunk is recorded, then the relative values of the frequency components. Then, instead of the frequency information having continuous values, they are mapped to the closest number of a few available values. For example 5.4 would map to 5, and 5.6 would map to 6. Information is lost because 5.4-5 is .4 and 5.6-6 is -.4. The image no longer can have a pixel 'wave' (we're in frequency now) with the value of 5.4 or 5.6, so information is lost. When the picture is regenerated, noise approximately .4 in magnatude will be present in the picture. Because the eye is relatively forgiving of high frequency noise, the encoded image may not look any different to eye. Additionally, other components in the reproduction cycle (printing presses, NTSC broadcast, Video tape dropouts, etc.) may have much larger errors. Won't JPEG compression make TV images look terrible? The broadcast standard for NTSC only has 1/2 the bandwidth for the color I&Q channels as it does for the luminance Y channel. Therefore there is only enough color information for every other pixel. When JPEG loses information in frequencies above the 50% duty cycle, the resulting image loss will not be detectable by the end receiver. Additionally, the I & Q channels are only allowed to be within a certain range of values when broadcast. Any loss of signal beyond that range due to JPEG is no different than that caused by the transmission stage anyway. Won't JPEG compression make my photographs print wrong? Printing a color picture on a four color press requires the arrangement of a multitude of different size dots. The arrangement of these dots will actually have more noise, in general, then compression by JPEG will produce. Also there are generally a limited number of configurations of those dots, so any rounding errors caused by JPEG may be masked by even bigger errors in the printing process. What's the most important thing to remember? Generally, only you know what the image looked like originally, so only you will know if it has changed. If you cant tell it has been compressed with both images next to each other, someone who has never seen either of them before wont be able to tell either. Pierce -- My postings are my opinions, and my opinions are my own not that of my employer. You can get me at radius!pierce@apple.com. (Wha'ja want? Some cute signature file? Hah! I have real work to do.