muttiah@stable.ecn.purdue.edu (Ranjan S Muttiah) (03/16/91)
I'm don't know too much about information theory. So, please bear with me if this question is vague. -------- Is there any change in information in reformulating the representation for data input (to a receiver, for example) ? Ex., The sequence: 145 8 40 44 vs The sequence: 10010001 00001000 00101000 00101100 -------- My off the cuff answer is no. However, if error were to be introduced by the channel entropy seems higher with the second representation.
mcmahan@netcom.COM (Dave Mc Mahan) (03/18/91)
In a previous article, muttiah@stable.ecn.purdue.edu (Ranjan S Muttiah) writes: >Is there any change in information in reformulating the representation >for data input (to a receiver, for example) ? > >Ex., > >The sequence: 145 8 40 44 > vs >The sequence: 10010001 00001000 00101000 00101100 That depends on how you represent things. Are you talking about converting one sequence of ascii characters to another set of ascii characters, or are you thinking in terms of raw bits, or what? It all depends on how you are viewing your sequences of information. Obviously, a person can play with the proper calculator or program and convert one ascii string sequence listed above to another. Is that what you want to do, or are you trying to examine how these things are different when represented inside a computer. It's kind of like four guys looking at a transportation method. Each person comes up with a different answer of: 1. It's a Car 2. It's a Ford 3. It's a Mustang 4. It's a shelby 5 liter Everybody is correct and they are all talking about the same thing, it's just different. Which one is more accurate? That depends on the context you wanted. Getting back to your original question, are we talking about ascii strings or actual numeric representations inside a CPU? >My off the cuff answer is no. However, if error were to be introduced by >the channel entropy seems higher with the second representation. Well, if we consider your two strings as ascii characters, we see that there is much more redundant data in the second one. Who needs all those leading zeros when we have a space to delimit a character string? Thus, we see that there is less information per symbol (ascii character = 1 symbol) for the second string because so much is useless. According to Hamming, "The entropy function measures the amount of uncertainty, surprise, or information we get from the outcome of some situation." Since we get the same information from both messages, I would say that the average entropy for the second expression is less, since we get less info per symbol. I always look at these things and ask myself the question, "Can I think of a method to compress this data?". IF I can, I know that the entropy of what I am looking at is less than what it could be. This may be ok, but for storage and transmittal where total number of bits count, this is less-than-optimal. I don't know if this answers your question because I don't know if my representation of what I think your symbol set is (2 lists of ascii characters that could concievably contain the same info, depending on viewpoint). I guess my answer would have to be, "The total information content of the two strings is the same, but there is less information/symbol in the second string". -dave -- Dave McMahan mcmahan@netcom.com {apple,amdahl,claris}!netcom!mcmahan