[comp.music] Digitized Voices

@seq1.keele.ac.uk (Rob Barth) (05/31/91)

Has anyone ever written a program that will convert text into speech ?

I am thinking of writing something that will play digitized parts of
speech (i.e. the ~40 phonemes used in English), and follow the rules of
English so that the words sound genuine.

I have the program that will use the digitized data - 
I am really looking for someone who has figured out all the rules, with
uncommon/ambiguous/exceptional words etc.

Thanks in advance

-- Rob --
                   /-----------------------------------\
                   |  Rob Barth  -  csw22@uk.ac.keele  |
                   \-----------------------------------/

ogata@leviathan.cs.umd.edu (Jefferson Ogata) (05/31/91)

In article <21576.9105302008@uk.ac.keele.seq1> seq1!@seq1.keele.ac.uk (Rob Barth) writes:
|> Has anyone ever written a program that will convert text into speech ?
|> 
|> I am thinking of writing something that will play digitized parts of
|> speech (i.e. the ~40 phonemes used in English), and follow the rules of
                                                                  ^^^^^^^^
|> English so that the words sound genuine.
   ^^^^^^^

What rules??

|> I have the program that will use the digitized data - 
|> I am really looking for someone who has figured out all the rules, with
|> uncommon/ambiguous/exceptional words etc.

I've seen commercial software products for this...there was one for
the Mac a while back. It wasn't always right. In particular, vocal
pitch inflection is a difficult thing to model.

There is an old chipset for converting text to speech; Radio (S)Hack
used to sell it. There were a couple of commercial black boxes with
these chips inside.

This is not a trivial problem. To get many words right, you need to
do some parsing (e.g. permit, refuse). To get vocal inflection
remotely right, you need to do a lot of parsing.

As to phoneme sequences, it's much easier to just make a big
dictionary than to try to abstract phonological transformations
for the English language. You might be able to do it adequately
if you restrict the transformations to latinate words and use a
dictionary for the rest. But even the transformations for
strictly latinate words are still pretty complicated (e.g. sign,
signature). You definitely need a good syllabification rule to
do any rule-based phonology.

--
Jefferson Ogata     University of Maryland      Computer Science Department
"Animals without backbones hid from each other or fell down. Clamasaurs and
 oysterettes appeared as appetizers. Then came the sponges, which sucked up
                     about ten percent of all life."

ttl@aura.cs.wisc.edu (Tony Laundrie) (06/01/91)

(This isn't the appropriate newsgroup, but...)

There are several public domain programs for the IBM-PC available via
anonymous ftp from simtel at 26.2.0.74 in msdos/voice/ 
or wuarchive.wustl.edu at 128.252.135.4 in mirrors/msdos/voice.  Here's the
index:
 Filename   Type Length   Date   Description
 ==============================================
 AUTOTALK.ARC  B   23618  881216  Digitized speech for the PC
 CVOICE.ARC    B   21335  891113  Tells time via voice response on PC speaker
 DIGISTUF.ARC  B   23662  890409  HP schematics for DIGITIZE.ARC & a bug fix
 DIGITIZE.ARC  B   65482  890403  Record/play human voice on PC
 HEARTYPE.ARC  B   10112  880422  Hear what you are typing, crude voice synth.
 HELPME2.ARC   B    8031  871130  Voice cries out 'Help Me!' from PC speaker
 MAXHEAD2.ARC  B   29184  870425  MAX HEADROOM for CGA. He talks, stutters, etc.
 REPLA101.ZIP  B  137100  900311  Plays 8bit digital sampled audio files on spkr
 SAY.ARC       B   20224  860330  Computer Speech - using phonemes
 TALK.ARC      B    8576  861109  BASIC program to demo talking on a PC speaker
 TRAN.ARC      B   39766  890715  Repeats typed text in digital voice
 VGREET.ARC    B   45281  900117  Voice says good morning/afternoon/evening

I have tried talk.arc and it works nice.  Just type in English text and it
gets most things right.  Follow up articles to comp.sys.ibm.pc.

mo@messy.bellcore.com (Michael O'Dell) (06/01/91)

Yes, there have been a number of systems built.  The first I know of was
"speak", done by Doug McIlroy at Bell Labs and distributed with early
Unix systems.  It was used by a blind friend of mine to build the
world's first talking terminal.  The next system I know about
is usually referred to as "The NRL Rules", a system developed at
the US Naval Research Labs. These two systems are similar in that they
were both created to drive a Votrax phoneme-based synthesizer
(manufactured by Federal Screw Works, I kid you not), they
are both rather like production systems with exception handling,
the underlying machinery isn't hard to implement, and both do a reasonably
utilitarian, if sometimes quite humorous, job most of the time, but by no
stretch generate natural sounding extended speech.
The NRL stuff has been given away and several
other talking terminals and such have extended the NRL ruleset in conjunction
with the SC01 and SC02 chip versions of the Votrax.

The next class of system to come along is the DECtalk, which was an amazing
improvement in speech intelligebility for untrained listeners.  I believe
it is an allophone synthesizer down deep, but it does a LOT of analysis
trying to do something with intonation and stress. It ain't perfect, but
it is MUCH better than NRL or SPEAK for untrained listeners and
random text. Of course, this is a hardware box
(has at least one Moto 68K inside,  I think).

The next generation is a system called ORATOR which was done here at Bellcore.
It was designed to be very, very good at pronouncing names of people
and places and such (from text), in addition to good general text conversion.
THe underlying synthesizer is a demi-syllable synthesizer.
A demi-syllable is a half syllable, and the analysis software
generates a streams of demi-syllables from the text.  The demi-syllable
stream goes to a  waveform synthesizer which uses some quite
wizardly  LPC techniques to actually generate the output digital audio.
It sounds amazingly good, at least the few times I've heard it.

Of all these things, I believe the NRL stuff is still freely available.
And I know Bellcore is actively interested in licensing the ORATOR
technology.  (hey, I do work here!)

        -Mike O'Dell

Bellcore?? Bellcore isn't allowed to have opinions, so these MUST be mine!

ken@opusc.csd.scarolina.edu (Ken Sallenger) (06/07/91)

In <1991May31.192614.4890@walter.bellcore.com>
     mo@messy.bellcore.com (Michael O'Dell) writes:

>The next class of system to come along is the DECtalk, which was an amazing
>improvement in speech intelligebility for untrained listeners.

For a nifty application of DECtalk boxes, see [hear] the special
issue of Computing Systems (V3, #2, 1990).  Several of the cuts
on the CD feature "Eddie and Eedie" talking and singing.

As of last fall, it was available from UC Press (415) 642-4191.
-- 
     Ken Sallenger / ken@bigbird.csd.scarolina.edu / +1 803 777-6551
     Computer Services Division / 1244 Blossom ST / Columbia, SC 29208