[comp.dsp] Change in information content

muttiah@stable.ecn.purdue.edu (Ranjan S Muttiah) (03/16/91)

I'm don't know too much about information theory.  So, please bear
with me if this question is vague.

--------
Is there any change in information in reformulating the representation
for data input (to a receiver, for example) ?

Ex.,

The sequence: 145 8 40 44
    vs
The sequence: 10010001 00001000 00101000 00101100

--------

My off the cuff answer is no.  However, if error were to be introduced by
the channel entropy seems higher with the second representation.

mcmahan@netcom.COM (Dave Mc Mahan) (03/18/91)

 In a previous article, muttiah@stable.ecn.purdue.edu (Ranjan S Muttiah) writes:
>Is there any change in information in reformulating the representation
>for data input (to a receiver, for example) ?
>
>Ex.,
>
>The sequence: 145 8 40 44
>    vs
>The sequence: 10010001 00001000 00101000 00101100

That depends on how you represent things.  Are you talking about converting
one sequence of ascii characters to another set of ascii characters, or are
you thinking in terms of raw bits, or what?  It all depends on how you are
viewing your sequences of information.  Obviously, a person can play with
the proper calculator or program and convert one ascii string sequence listed
above to another.  Is that what you want to do, or are you trying to examine
how these things are different when represented inside a computer.  It's kind
of like four guys looking at a transportation method.  Each person comes up
with a different answer of:

1.  It's a Car
2.  It's a Ford
3.  It's a Mustang
4.  It's a shelby 5 liter

Everybody is correct and they are all talking about the same thing, it's just
different.  Which one is more accurate?  That depends on the context you wanted.

Getting back to your original question, are we talking about ascii strings or
actual numeric representations inside a CPU?

>My off the cuff answer is no.  However, if error were to be introduced by
>the channel entropy seems higher with the second representation.

Well, if we consider your two strings as ascii characters, we see that there
is much more redundant data in the second one.  Who needs all those leading
zeros when we have a space to delimit a character string?  Thus, we see that
there is less information per symbol (ascii character = 1 symbol) for the
second string because so much is useless.  According to Hamming, "The entropy
function measures the amount of uncertainty, surprise, or information we get
from the outcome of some situation."  Since we get the same information from
both messages, I would say that the average entropy for the second expression
is less, since we get less info per symbol.  I always look at these things and
ask myself the question, "Can I think of a method to compress this data?".  IF
I can, I know that the entropy of what I am looking at is less than what it
could be.  This may be ok, but for storage and transmittal where total number
of bits count, this is less-than-optimal. 

I don't know if this answers your question  because I don't know if my
representation of what I think your symbol set is (2 lists of ascii characters
that could concievably contain the same info, depending on viewpoint).  I 
guess my answer would have to be, "The total information content of the two
strings is the same, but there is less information/symbol in the second string".

   -dave

-- 
Dave McMahan                            mcmahan@netcom.com
					{apple,amdahl,claris}!netcom!mcmahan