[comp.dsp] A simple, practical sound board

ergo@netcom.UUCP (Isaac Rabinovitch) (10/31/90)

I'm no audio ham, but I occasionally need to record and dub various
people talking about various things.  There seem to be a lot of
PC-compatible sound boards all of a sudden, and I'm struck by how
convenient it would be to be able to record and edit on a winchester
disk.  But costs vary by a couple of orders of magnitude, and
presumably quality does to.  Recommendations?  I need to be able to
get an hour into a megabyte with a sound quality at least equal to a
cheap walkman.  Is this doable for less than a month's rent?
-- 

ergo@netcom.uucp			Isaac Rabinovitch
netcom!ergo@apple.com			Silicon Valley, CA
{apple,amdahl,claris}!netcom!ergo

WISE SAYING NEEDED.  Must reflect positive human values.  Gentle humor a
plus.  Cuties, pseudo-quotations, and jingoistic proverbs need not apply.

poser@csli.Stanford.EDU (Bill Poser) (10/31/90)

There is no way you can get an hour of speech into a megabyte
with reasonable quality if you just digitize waveforms. Suppose
you sample at a resolution of 8 bits, which is the resolution of
the cheapo ADDACs you can buy for PCs and Macs. For research
purposes and hi-fi people use higher resolution.  That means your
megabyte gets you 1024000 samples. At 3600 seconds in an hour,
that means 284.44 samples per second is the maximum sampling rate
you can use. The corresponding Nyquist frequency is 142.22 Hz,
meaning that you can only represent frequencies below this level.
This is WAY too low. For music people use sampling rates around
44K samples/sec in order to get frequencies up to over 20KHz. For
speech you don't need anything that high. For speech research we
typically sample at 20K samples/sec and low pass filter at 8KHz.
That covers everything significant for speech.  For some purposes
we sample at 10KHz with low-pass filtering at 4KHz.  Engineers
often sample at 8K samples/sec because they expect to be working
with telephone speech, which is limited to the region below about
3200 Hz. Already we're talking about degraded, though
intelligible, speech. So you can see that if you just want to
record and edit waveforms, there is no way you can cram an hour
of speech into a megabyte.  To store speech at around 2000
bits/second as you wish to do is possible but requires non-trivial
coding, and I'm not sure that you will like the quality that
results. I think you're going to need a bigger disk.

am42+@andrew.cmu.edu (Alexander Paul Morris) (11/01/90)

If you want anything that sounds even like a cheap walkman (in mono)
you'll need way more than a megabyte for 1 hour.  1 meg may work for
about 1.5 to 2 minutes with almost cheap walkman sound.  If you can
prove me far wrong on this, please do, as I'd be interested in
discovering a way to do such a thing!

    Alexander Morris           ... Be excellent to each other,
    Carnegie Mellon                my excellent friends ...

ergo@netcom.UUCP (Isaac Rabinovitch) (11/02/90)

All right, my goal of an hour of sound in a megabyte betrayed my
ignorance of signal theory.  So let's bump up the space requirement by
an order of magnitude.  Are we into the real world yet?
-- 

ergo@netcom.uucp			Isaac Rabinovitch
netcom!ergo@apple.com			Silicon Valley, CA
{apple,amdahl,claris}!netcom!ergo

WISE SAYING NEEDED.  Must reflect positive human values.  Gentle humor a
...  
Hey, it was a JOKE!

poser@csli.Stanford.EDU (Bill Poser) (11/02/90)

In article <16002@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch) writes:
>All right, my goal of an hour of sound in a megabyte betrayed my
>ignorance of signal theory.  So let's bump up the space requirement by
>an order of magnitude.  Are we into the real world yet?

Well, if you just want to use simple sample and not any kind of coding,
you'll need a bit more. Here's how to do the calculation. Suppose you're
going to use one byte per sample. That gives you passable resolution,
though not what you'd want for research or real hi-fi. It's also the
resolution of all of the cheap digitizers. To get good quality speech
you should sample at a minimum of 12K samples/second. This gives you
room for anti-aliasing filtering at about 5KHz (you need to allow
for the fact that the filter cutoff is not perfectly sharp). So,
you need 12K bytes per second. That is 1.2e4 * 3.6e3 = 4.32e7
bytes per hour, or 43.2MB per hour. So you need another half an order
of magnitude to get into the ballpark.

Bill

ergo@netcom.UUCP (Isaac Rabinovitch) (11/02/90)

In <16150@csli.Stanford.EDU> poser@csli.Stanford.EDU (Bill Poser) writes:

>
>.... Here's how to do the calculation. Suppose you're
>going to use one byte per sample. That gives you passable resolution,
>though not what you'd want for research or real hi-fi. It's also the
>resolution of all of the cheap digitizers. To get good quality speech
>you should sample at a minimum of 12K samples/second. This gives you
>room for anti-aliasing filtering at about 5KHz (you need to allow
>for the fact that the filter cutoff is not perfectly sharp). So,
>you need 12K bytes per second. That is 1.2e4 * 3.6e3 = 4.32e7
>bytes per hour, or 43.2MB per hour. So you need another half an order
>of magnitude to get into the ballpark.
Your figure is a *little* off -- a K is 1024, not 1,000.  (Computers
are so dumb they have to count on their fingers -- both of them.)  But
I take your point -- casual digital recording would seem to be beyond
the current generation of computer media, and probably the next as
well!  And by the time 43 meg mass storage has joined 64K RAM in the
netherworld of "used to think that was a *lot*", we'll have more
interesting problems -- or more drastic ones.
-- 

ergo@netcom.uucp			Isaac Rabinovitch
netcom!ergo@apple.com			Silicon Valley, CA
{apple,amdahl,claris}!netcom!ergo

WISE SAYING NEEDED.  Must reflect positive human values.  Gentle humor a
...  
Hey, it was a JOKE!

ddulmage@cdp.UUCP (11/02/90)

Hello, well there are plenty of digital audio boards for most platforms,
though they are not very cheap if they are any good and do any tricks..
I sincerely doubt that you will find ANY! that will allow you to get an
hours worth onto 1 meg of disk. I have a very high quality digital cd
mastering system, and I average around 1.2 meg PER MINUTE!!! you can
kind of extrapolate from there.. sorry I could not give you any more
positive suggestions.

Doug Dulmage

poser@csli.Stanford.EDU (Bill Poser) (11/02/90)

In article <16015@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch) writes:
>Your figure is a *little* off -- a K is 1024, not 1,000.

I know. But a 2.4% difference isn't much when doing back-of-the-envelope
calculations.

paulr@syma.sussex.ac.uk (Paul T Russell) (11/03/90)

It's probably worth noting that the new Sound Manager which is part
of Macintosh System Software 6.0.7 onwards supports record and
playback to/from disk with real time 3:1 and 6:1 compression.
The two new models (Mac LC and Mac IIsi) even have built in
microphones and digitising hardware. Other models can have
the same capability by adding MacRecorder, a $150 box which
coonnects to the serial port. Sampling is 8 bit at 
22 kHz, 11 kHz or 5.5 kHz - OK for low quality speech
etc.


-- 
           Paul Russell, Department of Experimental Psychology
         University of Sussex, Falmer, Brighton BN1 9QG, England
     Janet: paulr@uk.ac.sussex.syma  Nsfnet: paulr@syma.sussex.ac.uk
    Bitnet: paulr%sussex.syma@ukacrl.bitnet  Usenet: ...ukc!syma!paulr

cerberus@caen.engin.umich.edu (R Eric Bennett) (11/07/90)

I don't know what everybody is giving gross figures for but I've used an 8-bit
codec sampler at 8Khz, and the results are great for speech. The man who asked
the question is obviously not doing research.
I was quite impressed, as somewhat of a layman, with NeXT's built in sampling.
Mac's 8-bit, 22Khz sampling, sounds awful.
Perhaps it has something to do with the speakers I'm listening to.
Anyway, my point is that NeXT does 8-bit sampling at 8Khz, meaning 29 Meg/hour.
And, it sounds darn good.

Eric Bennett
cerberus@caen.engin.umich.edu

rsnider@xrtll.uucp (Richard Snider) (11/08/90)

In article <16150@csli.Stanford.EDU> poser@csli.stanford.edu (Bill Poser) writes:
>In article <16002@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch) writes:
>>All right, my goal of an hour of sound in a megabyte betrayed my
>>ignorance of signal theory.  So let's bump up the space requirement by
>>an order of magnitude.  Are we into the real world yet?
>
>Well, if you just want to use simple sample and not any kind of coding,
>you'll need a bit more. Here's how to do the calculation. Suppose you're
>going to use one byte per sample. That gives you passable resolution,
>though not what you'd want for research or real hi-fi. It's also the
>resolution of all of the cheap digitizers. To get good quality speech
>you should sample at a minimum of 12K samples/second. This gives you
>room for anti-aliasing filtering at about 5KHz (you need to allow
>for the fact that the filter cutoff is not perfectly sharp). So,
>you need 12K bytes per second. That is 1.2e4 * 3.6e3 = 4.32e7
>bytes per hour, or 43.2MB per hour. So you need another half an order
>of magnitude to get into the ballpark.

Well, to furthur draw this out, let me get my 2 cents worth in here.
If you really wanted to be minamalist, you can get reasonable speech
sounds at about 6Khz to 8Khz.  (6 sounds a lot like a telephone, 8 is
more like a Cheap AM radio).  Assuming you also want "numerically
perfect" (ie. no error) reporduction, you can almost get 2 to 1 
compression using some form of delta encoding (differences between
samples).  If you are digitizing pure speech (no background sound or
music) you also will find that about 17% of this can be stored as silence.
(Again, this is from one of my samples, your mileage may vary).  If you 
care to go to approximate compression, you can likely get as much a 3
or 4 times compression.

Anyway, time for some math:

Say, sampling rate of 8KHz, 2 to 1 compression, and silence is encoded
as a 3 byte command so its contribution to a overall size of the file
will be negligable if we assume that silence is no change in the input
sample for say 300 samples (.0375 seconds).

8000 samples/sec * 3600 sec = 28.8 x 10^6 samples
2 to 1 compression gives      14.4 x 10^6 compressed samples
17% of the above is roughly (very roughly) 2.4 x 10^6

So your total number of bytes is 12 x10^6

Is that any closer ???
                                        Richard Snider
------------------------------------------------------------------------
Where: ..uunet!mnetor!yunexus!xrtll!rsnider    Also:  rsnider@xrtll.UUCP
An unbreakable tool is useful for breaking other tools.

dave@imax.com (Dave Martindale) (11/08/90)

In article <1990Nov7.064409.17295@engin.umich.edu> cerberus@caen.engin.umich.edu (R Eric Bennett) writes:
>
>I was quite impressed, as somewhat of a layman, with NeXT's built in sampling.
>Mac's 8-bit, 22Khz sampling, sounds awful.
>Perhaps it has something to do with the speakers I'm listening to.

Doesn't the NeXT use a companding ADC and DAC intended for telephony?
That has a lot better dynamic range than a linear 8-bit system.
On the other hand, telephony-oriented parts may only work at 8 kHz.

poser@csli.Stanford.EDU (Bill Poser) (11/08/90)

In article <1990Nov7.064409.17295@engin.umich.edu> cerberus@caen.engin.umich.edu (R Eric Bennett) writes:
>
>I don't know what everybody is giving gross figures for but I've used an 8-bit
>codec sampler at 8Khz, and the results are great for speech. The man who asked
>the question is obviously not doing research.
>I was quite impressed, as somewhat of a layman, with NeXT's built in sampling.
>Mac's 8-bit, 22Khz sampling, sounds awful.
>Perhaps it has something to do with the speakers I'm listening to.
>Anyway, my point is that NeXT does 8-bit sampling at 8Khz, meaning 29 Meg/hour.

The estimate I made was 43MB/hour, which will give somewhat better
quality. Sampling at 8Khz gives you telephone quality speech, which is
not too bad but nonetheless clearly distorted. (Try listening to /s/ and/sh/.)
The central point is not whether he can shave 25% off the filesize
by reducing the sampling rate a bit, but the order of magnitude.
His original hope of getting an hour into a  megabyte was off by over
an order of magnitude, and that is the crucial thing to understand.

								Bill