[comp.compression] Sound compression

n277cb@tamuts.tamu.edu (Todd Gruben) (05/14/91)

Does anyone know of a good "lossy" sound compression algorithm which I can
integrate into a "C" program?  I am reading in sound files and transmitting
them (in packets) over a low speed network.  I need to compress these files for
greater throughput.  So far, pkzip compresses the files to about 1/5th of their
original size.  I have heard of JPEG being used on sound and read articles on
fractal compression.  Is source code available for either of these routines?
If so, where?  Are there any others?  What about the routines used for fax
(group 3)?  Are these available anywhere?  

             Thanks in advance -- Todd

davidsen@rdsunx.crd.GE.COM (William E Davidsen) (05/15/91)

In article <16198@helios.TAMU.EDU>, n277cb@tamuts.tamu.edu (Todd Gruben)
writes:
|> 
|> Does anyone know of a good "lossy" sound compression algorithm which I can
|> integrate into a "C" program? 

  I did some 14 => 8 bit compression which worked pretty well, by using
the top three bits of the byte for a shift count to shift the highest 1
bit into the high order bit, dropping the high one bit, and putting the
next five bits in the low part of the byte. Thus full dynamic range and
six bit accuracy (well, almost) and the output was as compressible as
the original. Lossy for sure, but it sounded pretty good when
uncompressed. 

  Sorry I don't have source any more, but the way to quickly find the
high one bit (on a 2's complement machine) is:

  h1bit = n & (-n)

rpw3@rigden.wpd.sgi.com (Rob Warnock) (05/15/91)

In article <19524@crdgw1.crd.ge.com> davidsen@crdos1.UUCP
(bill davidsen) writes:
+---------------
|   Sorry I don't have source any more, but the way to quickly find the
| high one bit (on a 2's complement machine) is:
| 
|   h1bit = n & (-n)
+---------------

Sorry, this is quite incorrect! This will find the *lowest*-order one bit set!

To find the *highest*-order bit set (a.k.a. a "priority encoder") you need a
find-highest-one function, or a loop, or a "comb". The find-highest function
exists in some CPU instruction sets (e.g., the PDP-10's "JFFO", the Am29000's
"CLZ") but not in others (e.g., the VAX's "FFS" finds the *lowest* bit).

Many software implementations use some version of a binary tree search:

	bit_num_of_highest_one(unsigned int x)
	{
		int answer, i;

		if (x == 0)
			return -1;
		for (answer = 0, i = NBITSPERWORD/2; i; i >>= 1) {
			mask = (~0) << i;
			if (x & mask) {
				x >>= i;
				answer += i;
			}
		}
		return answer;
	}

Often the loop is unrolled, since the wordlength is usually fixed for a given
machine. A co-worker (Brendan Eich) first showed me the "comb" approach:

	bit_num_of_highest_one(unsigned int x)
	{
		int answer = 0;

		if (x == 0)
			return -1;
		if (x & 0xffff0000)
			answer += 16, x &= 0xffff0000;
		if (x & 0xff00ff00)
			answer += 8, x &= 0xff00ff00;
		if (x & 0xf0f0f0f0)
			answer += 4, x &= 0xf0f0f0f0;
		if (x & 0xcccccccc)
			answer += 2, x &= 0xcccccccc;
		if (x & 0xaaaaaaaa)
			answer += 1;
		return answer;
	}

While this is cute, it's not necessarily any faster than an unrolled version
of the "search" code above, since the 32-bit constants needed for the "comb"
version tend to take two instructions to generate on current RISCs.

By the way, there was a paper in CACM (??? sorry, forgot the ref) a number
of years ago which proved that there is no closed functional form without
conditionals which will compute "the highest-order one bit".


-Rob

-----
Rob Warnock, MS-1L/515		rpw3@sgi.com		rpw3@pei.com
Silicon Graphics, Inc.		(415)335-1673		Protocol Engines, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA  94039-7311

radford@ai.toronto.edu (Radford Neal) (05/16/91)

In article <104045@sgi.sgi.com> rpw3@sgi.com (Rob Warnock) writes:

>By the way, there was a paper in CACM (??? sorry, forgot the ref) a number
>of years ago which proved that there is no closed functional form without
>conditionals which will compute "the highest-order one bit".

Actually, you should be able to quickly find the position of the 
highest-order one bit of an integer on most machines with floating-point
instructions by jamming the integer into the mantissa part of a floating-
point number, along with some appropriate exponent, and then performing
some operation that will induce the floating-point processor to do a
normalization. The number of the highest-order one bit in the original
integer should then be readily obtainable by examining the exponent part
of the normalized floating-point result.

   Radford Neal

lance@motcsd.csd.mot.com (lance.norskog) (05/16/91)

Back to sound compression...

Examination of the data always helps.  If you look at a sound sample
(I use MixView, a marvelous package from two guys at Columbia, contact
doug@woof.columbia.edu) you will notice that it's usually a regular
waveform.  I have this theory that you can do compression via 
delta modulation by seeking forward X samples and hunting for a
nearby sample.  With regular waveforms, X will tend to be constant
for interesting periods of time.

I like the 14->8 trick very much.  If I didn't have a stinking Sound
Blaster I'd try it.

Lance

gwyn@smoke.brl.mil (Doug Gwyn) (05/16/91)

In article <104045@sgi.sgi.com> rpw3@sgi.com (Rob Warnock) writes:
>By the way, there was a paper in CACM (??? sorry, forgot the ref) a number
>of years ago which proved that there is no closed functional form without
>conditionals which will compute "the highest-order one bit".

And of course that "theorem" is wrong:  using the operand to index
into a table can immediately produce the appropriate answer.

For large word sizes or small machines, that will often be impractical,
but it can serve as a basis for an implementation using smaller tables
(which does require use of conditionals):

	int ffo( register unsigned long x )
		{
		static int	bitnum[256] =
			{      -1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
				4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
				5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
				5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
				6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
				6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
				6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
				6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
				7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
			};

		if ( (x & 0xFF000000) != 0 )
			return 24 + bitnum[x >> 24];
		else if ( (x & 0x00FF0000) != 0 )
			return 16 + bitnum[x >> 16];
		else if ( (x & 0x0000FF00) != 0 )
			return	8 + bitnum[x >>  8];
		else
			return	    bitnum[x      ];
		}

davidsen@rdsunx.crd.GE.COM (William E Davidsen) (05/17/91)

In article <104045@sgi.sgi.com>, rpw3@rigden.wpd.sgi.com (Rob Warnock) writes:
|> In article <19524@crdgw1.crd.ge.com> davidsen@crdos1.UUCP
|> (bill davidsen) writes:
|> +---------------
|> |   Sorry I don't have source any more, but the way to quickly find the
|> | high one bit (on a 2's complement machine) is:
|> | 
|> |   h1bit = n & (-n)
|> +---------------
|> 
|> Sorry, this is quite incorrect! This will find the *lowest*-order one
bit set!

  You obviously read what I said, not what I meant ;-)

  Yes, my note was woefully incorrect. It turned out that for the data I
had it was quicker to find the low bit and discard repeatedly than to
shift looking for the high. I should have said more or nothing.

  There was a discussion of this in comp.lang.c some few years ago, and
for random data a binary search is faster. Sound data isn't random, at
least not in the programming sense.

rpw3@rigden.wpd.sgi.com (Rob Warnock) (05/22/91)

In article <91May15.140250edt.750@neuron.ai.toronto.edu>
radford@ai.toronto.edu (Radford Neal) writes:
+---------------
| In article <104045@sgi.sgi.com> rpw3@sgi.com (Rob Warnock) writes:
| >...paper... which proved that there is no closed functional form without
| >conditionals which will compute "the highest-order one bit".
| 
| Actually, you should be able to quickly find the position of the 
| highest-order one bit of an integer on most machines with floating-point
| instructions by jamming the integer into the mantissa part of a floating-
| point number, along with some appropriate exponent, and then performing
| some operation that will induce the floating-point processor to do a
| normalization.
+---------------

I wasn't arguing against there being practical methods, only noting that
the paper said there are no "closed functional forms without conditionals".
Floating-point normalization is either done iteratively [which requires a
termination test, ie., a "conditional", albeit in the f.p. microcode] or is
done with a "flash" normalizer, i.e., a priority encoder [for which there
is no traditional closed-form mathematical operator!]. In either case, the
paper's results are not contradicted.

Your method is cute; I like it. Yet it seems useful only on machines which
have a "flash" f.p. normalizer *and* don't provide an integer priority-encoder
operation. In my experience, the integer priority-encoder (JFFO, FFS, or CLZ
instruction) is often provided on non-f.p. machines, but rarely the reverse
[since, given the "flash" f.p. normalizer in the datapath, the prio-encoder
operation is "free"].

Let's re-open the question to others: What *other* fast, practical methods
are there to find the "leftmost bit" a.k.a. "binary logarithm" of an integer,
besides:

1. Table-lookup ["small" values only]
2. Brute-force iteration [incl. doing log_2(wordsize) "tree" or "comb" steps]
3. Hardware priority encoder instruction [JFFO, FFS, CLZ, or equiv.]
4. Floating-point normalizer [Neal's method]

It seems that the binary log ought to be a useful element of many kinds of
lossy compression, not just audio...


-Rob

peltz@cerl.uiuc.edu (Steve Peltz) (05/23/91)

In article <105539@sgi.sgi.com> rpw3@sgi.com (Rob Warnock) writes:
>Let's re-open the question to others: What *other* fast, practical methods
>are there to find the "leftmost bit" a.k.a. "binary logarithm" of an integer,
>besides:

Here's what is used on our system (Cyber 60 bit word one's complement; I'm
translating to C). It requires a fast population count and shifting.

#define blur(i,o) d = d | ((d = d | ((o) >> i)) >> (2 * i))
#define hibit(x)  bitcnt(blur(16, blur(4, blur(1, d = x)))

This also obviously depends on order of operation; let's see if I can unroll it
a bit.

	/* blur(1, d=x) */
	d = x;
	d |= d >> 1;
	d |= d >> 2;
	/* blur(4, ...) */
	d |= d >> 4;
	d |= d >> 8;
	/* blur(16, ...) */
	d |= d >> 16;
	d |= d >> 32;
	return bitcnt(d);

d is a signed 60-bit value, so right shifting does shift in the sign bit, but
that doesn't matter for the algorithm.

I don't know if there is another origin for this algorithm, but it is listed
in our library as having been written by Mike Huybenz, Feb '77.

For a 32 bit value, the last shift isn't necessary.
--
Steve Peltz
Internet: peltz@cerl.uiuc.edu	PLATO/NovaNET: peltz/s/cerl

jk87377@cc.tut.fi (Juhana Kouhia) (05/23/91)

If I have noted correctly people working with sound compression do try
design only a real time sound compression systems.
Isn't there any interest in developing a compression method which
compress a large sound file, say one hour, at once? (At a time?)
How much of samples current sound compression algorithms see at once?

Juhana Kouhia

madler@nntp-server.caltech.edu (Mark Adler) (05/24/91)

Just thought I'd throw into this thread that Sony has announced a 2.5 inch
CD format that puts 74 minutes of music in about 175M using 5:1 compression.
They are using psychoacoustic techniques, which means they are throwing
away subband information that it masked by other subband information based
on experiments with human frequency masking.  It will be interesting to
see the audiophile reaction to this.

Sony says that this is their best shot at audio compression currently, and
that more than 5:1 was unnacceptable.  They have 1Mbit of ram in the design
to buffer three seconds of sound to account for compression variation,
tracking loss, and other glitches and come out continuous and uninterrupted.
(The player actually keeps going for three seconds after you pull out the
CD!)

This is not the same as the format they are proposing for MPEG sound
compression, implying that the MPEG proposal is older and not as good.

Mark Adler
madler@tybalt.caltech.edu

brad@looking.on.ca (Brad Templeton) (05/24/91)

You would think sound should be more redundant than that.  I had heard
plans for 2.5inch CDs with over 2 hours of sound in the past, I guess
those got scrapped.    At 2 hours, they would begin to have trouble
figuring what to put on it.  Even today people are more ruluctant to
buy 30 minute CDs when 70 minute CDs are abundant.   And of course long
double-albums get to charge double, when we all know the cost of the CD
is peanuts.

I have heard of some people doing 1 meg/minute for supposed hi-fi.
Do they lie?

On a tangential note, I do hope they get with it and package any new
CD format in a case like that for 3.5inch disks.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

madler@nntp-server.caltech.edu (Mark Adler) (05/24/91)

>> I have heard of some people doing 1 meg/minute for supposed hi-fi.
>> Do they lie?

One megabit or one megabyte?  If the former, they lie.  However, I
have heard of 192Kbps for stereo CD quality, with 128Kbps well in
sight.  128Kbps would come out to about 1 megabyte/minute.  The Sony
is doing about 300Kbps, which is still pretty respectable.  One thing
I forgot to mention is that these 2.5 inch mini-disks will be writable
too, so the compression stuff has to be cheap enough for commercial
gear.

Mark Adler
madler@tybalt.caltech.edu

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (05/24/91)

In article <1991May24.083409.20528@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>You would think sound should be more redundant than that. 

It appeared that sound is not redundant at all. All attempts to compress
digitized sounds have failed in the sense that critical ears can hear the
difference. Even the 16 bits of present CDs is not completely adequate.
19 bits probably is, and the 44 kHz sampling rate probably is - given
really good pre and post analog filtration. 

It is true that you can do some really radical things to audio in the
analog domain -- like screwing up the phase-versus-frequency linearity --
without audible effect, but that does not reduce the information content
of the signal.

Of course this does not apply to synthetic, electronically generated sounds,
as, of course, you can compress the hell out of, for example, a signal
consisting of two sine waves.

>I had heard
>plans for 2.5inch CDs with over 2 hours of sound in the past, I guess
>those got scrapped.    At 2 hours, they would begin to have trouble
>figuring what to put on it.  Even today people are more ruluctant to
>buy 30 minute CDs when 70 minute CDs are abundant.   And of course long
>double-albums get to charge double, when we all know the cost of the CD
>is peanuts.
>
>I have heard of some people doing 1 meg/minute for supposed hi-fi.
>Do they lie?

Yes.

There is a LOT of sloppy pseudo-scrience crap in the audio world, and 
the so-called golden ears often fail to hear certain non-information-losing
transformations, but the attempts to compress music using informatrion-losing
methods have all been audible.

Doug MCDonald

wilf@sce.carleton.ca (Wilf Leblanc) (05/25/91)

mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:

>It appeared that sound is not redundant at all. All attempts to compress
>digitized sounds have failed in the sense that critical ears can hear the
>difference. Even the 16 bits of present CDs is not completely adequate.
>19 bits probably is, and the 44 kHz sampling rate probably is - given
>really good pre and post analog filtration. 

I agree, 16 bits wasn't enough considering the dynamic range of music.
Some people would argue that 44 kHz isn't enough either.  However,
CD's 16 bits/ 44 kHz is still pretty good, with very little audible
distortion.

>It is true that you can do some really radical things to audio in the
>analog domain -- like screwing up the phase-versus-frequency linearity --
>without audible effect, but that does not reduce the information content
>of the signal.

True, but a good quantizer (and I use the word quantizer loosely),
would know all about this.

>[...]

>There is a LOT of sloppy pseudo-scrience crap in the audio world, and 
>the so-called golden ears often fail to hear certain non-information-losing
>transformations, but the attempts to compress music using informatrion-losing
>methods have all been audible.

Well, above you say that you can really screw up the "phase-versus-
frequency linearity -- without audible effect".  Is this sloppy
pseudo-science crap as well ??

BTW, have YOU heard all the compression techniques (i.e. Sony's
system) and can you say with confidence that the distortion is audible ??

>Doug MCDonald

--
Wilf LeBlanc, Carleton University, Systems & Comp. Eng. Ottawa, Canada, K1S 5B6
Internet: wilf@sce.carleton.ca   UUCP: ...!uunet!mitel!cunews!sce!wilf
                Oh, cruel fate! Why do you mock me so! (H. Simpson)

rdippold@cancun.qualcomm.com (Ron Dippold) (05/25/91)

In article <1991May23.215341.7836@nntp-server.caltech.edu> madler@nntp-server.caltech.edu (Mark Adler) writes:
>
>Just thought I'd throw into this thread that Sony has announced a 2.5 inch
>CD format that puts 74 minutes of music in about 175M using 5:1 compression.
>They are using psychoacoustic techniques, which means they are throwing
>away subband information that it masked by other subband information based
>on experiments with human frequency masking.  It will be interesting to
>see the audiophile reaction to this.

I'm not an audiophile, but the general consensus seems to be that it's not
as good as CDs, but definitely at least tape quality.

Plus you can bang that thing around all you want without making it skip.

-- 
Standard disclaimer applies, you legalistic hacks.     |     Ron Dippold

d88-jwa@cyklop.nada.kth.se (Jon W{tte) (05/25/91)

wilf@sce.carleton.ca (Wilf Leblanc) writes:

> I agree, 16 bits wasn't enough considering the dynamic range of music.

Well, there are other methods - 16 bits _is_ enough given logarithmic
quantizising.

I believe that there may very well be schemes for lossy compression that
aren't detectable. After all, a vinyl record has it's own imperfections.

I certainly believe that, given a quantizising method and bitrate
(say 16bit linear and 44kHz) there are methods to reduce the amount
of information being transferred without losing sound quality - since
the losses are below the treshold of the quantizising system itself.

--
						Jon W{tte
						h+@nada.kth.se
						- Power !

nbvs@cl.cam.ac.uk (Nicko van Someren) (05/25/91)

In article <1991May24.144519.4129@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:
>
>It is true that you can do some really radical things to audio in the
>analog domain -- like screwing up the phase-versus-frequency linearity --
>without audible effect, but that does not reduce the information content
>of the signal.
>
If you can screw up the phase you can half the information content by storing
the amplitude but not the phase from the frequency domain.  You will probably
want to randomise the phase at the other end so as not to get clicks on
playback.

Nicko
+-----------------------------------------------------------------------------+
| Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903    |
+-----------------------------------------------------------------------------+