@seq1.keele.ac.uk (Rob Barth) (05/31/91)
Has anyone ever written a program that will convert text into speech ?
I am thinking of writing something that will play digitized parts of
speech (i.e. the ~40 phonemes used in English), and follow the rules of
English so that the words sound genuine.
I have the program that will use the digitized data -
I am really looking for someone who has figured out all the rules, with
uncommon/ambiguous/exceptional words etc.
Thanks in advance
-- Rob --
/-----------------------------------\
| Rob Barth - csw22@uk.ac.keele |
\-----------------------------------/
ogata@leviathan.cs.umd.edu (Jefferson Ogata) (05/31/91)
In article <21576.9105302008@uk.ac.keele.seq1> seq1!@seq1.keele.ac.uk (Rob Barth) writes: |> Has anyone ever written a program that will convert text into speech ? |> |> I am thinking of writing something that will play digitized parts of |> speech (i.e. the ~40 phonemes used in English), and follow the rules of ^^^^^^^^ |> English so that the words sound genuine. ^^^^^^^ What rules?? |> I have the program that will use the digitized data - |> I am really looking for someone who has figured out all the rules, with |> uncommon/ambiguous/exceptional words etc. I've seen commercial software products for this...there was one for the Mac a while back. It wasn't always right. In particular, vocal pitch inflection is a difficult thing to model. There is an old chipset for converting text to speech; Radio (S)Hack used to sell it. There were a couple of commercial black boxes with these chips inside. This is not a trivial problem. To get many words right, you need to do some parsing (e.g. permit, refuse). To get vocal inflection remotely right, you need to do a lot of parsing. As to phoneme sequences, it's much easier to just make a big dictionary than to try to abstract phonological transformations for the English language. You might be able to do it adequately if you restrict the transformations to latinate words and use a dictionary for the rest. But even the transformations for strictly latinate words are still pretty complicated (e.g. sign, signature). You definitely need a good syllabification rule to do any rule-based phonology. -- Jefferson Ogata University of Maryland Computer Science Department "Animals without backbones hid from each other or fell down. Clamasaurs and oysterettes appeared as appetizers. Then came the sponges, which sucked up about ten percent of all life."
ttl@aura.cs.wisc.edu (Tony Laundrie) (06/01/91)
(This isn't the appropriate newsgroup, but...) There are several public domain programs for the IBM-PC available via anonymous ftp from simtel at 26.2.0.74 in msdos/voice/ or wuarchive.wustl.edu at 128.252.135.4 in mirrors/msdos/voice. Here's the index: Filename Type Length Date Description ============================================== AUTOTALK.ARC B 23618 881216 Digitized speech for the PC CVOICE.ARC B 21335 891113 Tells time via voice response on PC speaker DIGISTUF.ARC B 23662 890409 HP schematics for DIGITIZE.ARC & a bug fix DIGITIZE.ARC B 65482 890403 Record/play human voice on PC HEARTYPE.ARC B 10112 880422 Hear what you are typing, crude voice synth. HELPME2.ARC B 8031 871130 Voice cries out 'Help Me!' from PC speaker MAXHEAD2.ARC B 29184 870425 MAX HEADROOM for CGA. He talks, stutters, etc. REPLA101.ZIP B 137100 900311 Plays 8bit digital sampled audio files on spkr SAY.ARC B 20224 860330 Computer Speech - using phonemes TALK.ARC B 8576 861109 BASIC program to demo talking on a PC speaker TRAN.ARC B 39766 890715 Repeats typed text in digital voice VGREET.ARC B 45281 900117 Voice says good morning/afternoon/evening I have tried talk.arc and it works nice. Just type in English text and it gets most things right. Follow up articles to comp.sys.ibm.pc.
mo@messy.bellcore.com (Michael O'Dell) (06/01/91)
Yes, there have been a number of systems built. The first I know of was "speak", done by Doug McIlroy at Bell Labs and distributed with early Unix systems. It was used by a blind friend of mine to build the world's first talking terminal. The next system I know about is usually referred to as "The NRL Rules", a system developed at the US Naval Research Labs. These two systems are similar in that they were both created to drive a Votrax phoneme-based synthesizer (manufactured by Federal Screw Works, I kid you not), they are both rather like production systems with exception handling, the underlying machinery isn't hard to implement, and both do a reasonably utilitarian, if sometimes quite humorous, job most of the time, but by no stretch generate natural sounding extended speech. The NRL stuff has been given away and several other talking terminals and such have extended the NRL ruleset in conjunction with the SC01 and SC02 chip versions of the Votrax. The next class of system to come along is the DECtalk, which was an amazing improvement in speech intelligebility for untrained listeners. I believe it is an allophone synthesizer down deep, but it does a LOT of analysis trying to do something with intonation and stress. It ain't perfect, but it is MUCH better than NRL or SPEAK for untrained listeners and random text. Of course, this is a hardware box (has at least one Moto 68K inside, I think). The next generation is a system called ORATOR which was done here at Bellcore. It was designed to be very, very good at pronouncing names of people and places and such (from text), in addition to good general text conversion. THe underlying synthesizer is a demi-syllable synthesizer. A demi-syllable is a half syllable, and the analysis software generates a streams of demi-syllables from the text. The demi-syllable stream goes to a waveform synthesizer which uses some quite wizardly LPC techniques to actually generate the output digital audio. It sounds amazingly good, at least the few times I've heard it. Of all these things, I believe the NRL stuff is still freely available. And I know Bellcore is actively interested in licensing the ORATOR technology. (hey, I do work here!) -Mike O'Dell Bellcore?? Bellcore isn't allowed to have opinions, so these MUST be mine!
ken@opusc.csd.scarolina.edu (Ken Sallenger) (06/07/91)
In <1991May31.192614.4890@walter.bellcore.com> mo@messy.bellcore.com (Michael O'Dell) writes: >The next class of system to come along is the DECtalk, which was an amazing >improvement in speech intelligebility for untrained listeners. For a nifty application of DECtalk boxes, see [hear] the special issue of Computing Systems (V3, #2, 1990). Several of the cuts on the CD feature "Eddie and Eedie" talking and singing. As of last fall, it was available from UC Press (415) 642-4191. -- Ken Sallenger / ken@bigbird.csd.scarolina.edu / +1 803 777-6551 Computer Services Division / 1244 Blossom ST / Columbia, SC 29208