[net.audio] Floating Point CD's

schooler@inmet.UUCP (10/25/84)

	What About Floating Point Representation for CD's?

Currently, a CD waveform sample is represented as a 16-bit integer,
giving a dynamic range of approx. 2^16 = 10^4.8 = 96 dB (roughly).
This integer representation has extremely high (relative) precision
at the upper end of the scale, and low precision at the lower end
of the scale.

Consider a 16-bit floating point representation: say 6 bits of exponent
(base 2) and 10 bits of fraction.  Using the normal implicit-first-bit-
is-1 representation for the fraction, the smallest representable number
is .5, and the largest is 2^63.  The dynamic range is thus 2^64 =
10^19 = 385 dB (roughly).  The precision is .1% of a "dynamic octave".

This sounds like a big win.  Is there something wrong with my math?  Are
there grave difficulties with floating point (or equivalently, logarithmic)
A-D/D-A devices?

			-- Richard Schooler

lutton@inmet.UUCP (10/28/84)

<>
There were some experiments with floating-point representation of
digitized sound back in the 70's.  The results indicated that you
gained something with the exponent bits but lost something by having
only 12 or fewer mantissa bits; net result was that 16 bit floating
point and 16 bit fixed point sounded just about the same.  Fixed
point is easier to work with, so floating point was forgotten about.

     HOWEVER:  the test signals were computer-generated, i.e. known
beforehand not to exceed a certain fixed level.

   Perhaps it's time for a fresh look at floating-point.

schooler@inmet.UUCP (10/28/84)

As someone pointed out, a 16-bit floating-point representation is over-kill.
(Who needs > 150 dB?!) Consider, however, an 8-bit logarithmic
representation, as proposed by Edgar and Lee in "FOCUS Microcomputer Number
System", Comm. ACM, Vol. 22, Num. 3, March 1979.  This representation has
one sign bit and a seven-bit exponent in fixed-point format with three
fraction bits.  The sign of the exponent is encoded by an offset, i.e. 0
1000.000 = +2^0 = 1.  The authors claim a range of 96 dB, an absolute S/N of
93 dB, and an instantaneous S/N (precision, roughly) of 32 dB.  I quote, "In
audio applications the noise level of even the 8-bit FOCUS compares
favorably to the highest quality cassette recordings as a means of signal
handling."  Furthermore, they suggest a simple circuit for logarithmic A-D's
and exponential D-A's.  While addition and subtraction are to be avoided,
multiplication, division and exponentiation are fast and exact.  If all this
is really valid, we can cram twice as much sound per bit onto our favorite
digital medium.

		-- Richard Schooler

herbie@watdcsu.UUCP (Herb Chong, Computing Services) (10/28/84)

I think the main problem is the extra hardware that has to be added to
do these things.  Aside from that, I can see no major problem.  However,
logarithmic compression has almost exactly the same effect and has been
used in most of the 14 bit digital systems.  The other problem is that with
the exponent and mantissa repressentation of the number, at least 16 bits have
to be used for the mantissa and about 8 bits for the exponent.  Unless you
use a hexadecimal coding scheme for floating point number as IBM does for
their computers, you may even need more than that.  The net result is at
least 50% higher data rate.  When the original specs for CD's were
drawn up, 16 bits was considered barely achieveable on a commercial scale.

Herb...

I'm user-friendly -- I don't byte, I nybble....

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdcsu!herbie
CSNET: herbie%watdcsu@waterloo.csnet
ARPA:  herbie%watdcsu%waterloo.csnet@csnet-relay.arpa
BITNET: herbie at watdcs,herbie at watdcsu

dmmartindale@watcgl.UUCP (Dave Martindale) (10/29/84)

Arghh!  Using hexadecimal-base floating point, as in the IBM S/360 and
its successors, is the least efficient way to use the bits in a word.
The reason is that, on the average, two of the mantissa bits will be
zero and carrying no useful information at all.  Binary-base floating
point, particularly if the always-1 leading bit is not stored, is most
efficient.  If this seems unclear, talk to any numerical analyst.

And you need 8 bits of exponent only if you want to cover a fairly wide
"dynamic range" of numbers, about 10^77.  An audio dynamic range of 120dB
is a 10^6 range of voltage levels, which needs somewhere between 4 and 5
bits of exponent.  Four might be enough.

mwm@ea.UUCP (10/30/84)

There is a floating point format known as FOCUS developed by a grad student
at the University of Oklahoma. It was designed for electronic sampling
instrumentation, and avoids the "hole around zero" problem with normal
floating point representations. Perhaps this would be a better choice than
either a normal floating point or integer representation?

I don't have the article anywhere near me, but will gladly dig it out for
anyone sending mail asking about it.

	<mike

jlg@lanl.ARPA (11/02/84)

I've been thinking about floating point music representation for some time.
The best format seems to be 1 sign-bit, 3 (or 4) exponent bits, and 13 (or
12 ) significand bits.  The 3 bit exponent with a IEEE floating point type 
of gradual underflow and a hidden normalization bit gives a dynamic range
of about 120 db.  The signal to noise ratio is always about 78 db - that
is 78 db below present signal ( If I'm listening to a sound at 90 db, I
can't hear noises at 12 db  - I don't think anyone else can either).

The 4 bit exponent format has a dynamic range of over 160 db, with a signal
to noise ratio of about 70 db.  This is beyond the dynamic range of human
hearing - should be sufficient.  It is also beyond the dynamic range of current
recording techniques, the widest A/D I've seen with the ability to run the
speeds required for audio recording were 18 bits (several thousand $).  
The four bit exponent format is suitable for compression of 28 bit data,
I don't think we'll ever see that!  

Both the 3 bit and 4 bit exponent formats given here require 16 bits of
space on the recording medium.  That makes them compatible with the
current data encoding, error reduction schemes.  The only thing required
is circuitry to convert such floating point numbers into integers for
the D/A conversion.

jlg@lanl.ARPA (11/07/84)

> There is a floating point format known as FOCUS developed by a grad student
> at the University of Oklahoma. It was designed for electronic sampling
> instrumentation, and avoids the "hole around zero" problem with normal
> floating point representations. Perhaps this would be a better choice than
> either a normal floating point or integer representation?
> 
> I don't have the article anywhere near me, but will gladly dig it out for
> anyone sending mail asking about it.
> 
> 	<mike

If I remember to FOCUS design correctly, it used a scheme for varying the 
size of the exponent field to prevent underflow (or overflow for that matter).
Unfortunately, this required the use of more exponent bits to begin with, 
thus making the significand smaller.  The 'hole around zero' is not much
of a problem for audio purposes since a signal that small would be
120 to 150 DB below the maximum representable signal (maybe more, depending
on which floating point format you chose to use).  This would probably
be below the threshold of hearing for any reasonable volume level.