info-mac@uw-beaver (12/07/84)
From: Mike Schuster <MIKES@CIT-20.ARPA> Smoothtalker Demo Disk I talked with the people at First Byte last week about their product Smoothtalker. They mentioned that they had a software developers package for about $500, a retail package for $150, and a promotional demo disk, which they offered to send me free of charge. I received the disk two days later, along with a few pages of notes. The disk contains the Smoothtalker demo application and its companion picture resource file. No licensing agreement was included in the package, and the disk is stamped "Demo - Ok to copy". I concluded that the software is in the public domain and decided to provide some documentation. The demo application (which I will call ST from now on) is a rolling demonstration consisting of a sequence of cartoons and spoken text. Using Apple's RMOVER, I found that the text fragments are contained in ST as 'STR ' resources with id numbers 256 and up. Each of these resources contains a length byte followed by a sequence of ASCII phoneme codes. Here is 'STR ' resource 256, which says "The freedom of speech": V6S25DHAXf9rIYS9d5AHS6m3AHv S79spy0IYCH After studying all of these examples, I compiled the following dictionary of phoneme codes: AE pAt/EY Ate/b BiB/CH CHew/d DeeD/EH pEt/IY bE/AX thE f Fit/g GaG/h Hat/IH hIt/AY pIE/j Just/k KiCK/l seLf m MuM/n No/NG siNG/AA pOt/OW gO/AO fOr/OY bOY/AW OUt UH tOOk/UW cOO/p PoP/r Run/s SauCe/SH SHy/t To/TH THin DH THe/AH cUt/ER vERb/v oF/w Wag/y Yes/z siZe/ZH viSion V# volume #/S# speed #/# pitch #/B base/T treble/] +stress/[ -stress I have listed each phoneme code along with a short word containing that phoneme. I capitalized the corresponding letters in the word. Notice that the double letter phoneme codes are in upper case, where as the single letter codes are lower case. The volume, speed and pitch values are the ASCII digits 0, 1, ..., 9. These values apply to all subsequence phonemes until changed. The ] and [ codes seem to add and subtract stress from the following phoneme, respectively. The B and T codes select low base and high treble voice, respectively. Here is a rough translation of 256: V6S25DHAXf9rIYS9d5AHS6m3AHv S79spy0IYCH V6S25DAHX --- "the" spoken with volume 6, speed 2, and pitch 5 f9rIYS9d5AHm --- "freedom" spoken with two changes in pitch and one change in speed S79spy0IYCH --- a pause, speed 7, pitch 9, followed by "speech" with a pitch change to 0 just before the "eech" Here is another useful example, its the "ABC Song": 4S2EY bIY 6sIY dIY 7IY EHf 6jIY S55EYCH AY 4jEY kEY 3S7EHl EHm EHn OWh 1pIY S46kyUW AHr 5EHs tIY yUW 4vIY 6dAHblyUW 5EHkz 6wAY 1zIY S45nAWAY6nOWm7AYS3EYbIY6sIYz 5tEHlmIY4wAHtyUW3THIHnkAHv2mIY In addition to the 'STR ' resources, ST contains four 'CODE' resources. Using Apple's "Examine File" and the jump table contained in the 'CODE' 0 resource, I found that the phoneme-to-speech translator is contained in the 'CODE' 2 resource. Its entry point offset is 4 (that is, its first instruction is just after the standard 4 byte segment loader header). By replacing the first instruction with a trap to MacsBug, I found that the translator has the following argument interface: short speak(text, volume, speed, pitch) char *text; short volume; short speed; short pitch; The arguments are pushed on the stack using the standard stack-based toolbox conventions. The text argument points to the phoneme string (in pascal format with leading length byte). The volume, speed, and pitch arguments apparently specify the initial values for these parameters. After some experimentation, I found that the result returned is zero unless the text contains something that isn't a valid phoneme or control code. In this case, nothing is spoken and the result contains an offset into the text of the illegal item. Armed with this information, you can use the phoneme-to-speech translator in your own applications. For example, I wrote one application which I call "ABSpeak" that presents a dialog box containing two editText boxes. The phonetic strings you type in each box are can be spoken alternately for a A versus B comparison. Also, I wrote a desk accessory that speaks the time of day. (This one is not too useful on a thin Mac, since the phoneme-to-speech resource is 24k bytes). I will distribute "ABSpeak" to anyone who wishes to experiment with Smoothtalker. S8mAYkAOl S56SHUW5stER Michael Schuster @cit-20 -------