eklhad@ihuxv.UUCP (02/11/87)
<> A Talking Console Device Driver For The AT&T PC6300 ABSTRACT A new console device driver gives the AT&T PC6300 the power of speech, allowing blind workers to use the micro computer effectively. This device driver takes all standard output generated by the PC6300, redirects it into an internal buffer, and enables the blind worker to read the text in an efficient and productive manner. Arbitrary line oriented applications can be run without modifications. Using traditional software generation tools, the blind worker can develop personalized software, or modify the talking device driver itself. The software expects a Votrax Type N Talk speech unit attached to the serial port, but a modular design allows other synthesizers to be substituted without much reprogramming. The source is in the public domain, and is available on floppy disks or via electronic mail. 1. INTRODUCTION Since computers have become an indispensable tool in many professions, the blind worker's productivity often depends critically on an efficient, user friendly human-machine interface. This becomes even more important as CD-ROM peripherals enable the micro computer to act as a reference library. Already, a PC6300 can be equipped with an electronic version of the encyclopedia for only $1,000. For the first time, the blind researcher may have access to inexpensive electronic information. While a talking terminal provides an interface to any computer, the combination is unnecessarily complex, and many cannot afford to buy or rent both the talking terminal and the target computer. A general purpose talking micro computer would allow blind workers to exploit the capabilities latent in every micro computer, including terminal emulation if desired. A new talking device driver for the PC6300 brings this goal within easy reach. The software has the following features: 1. A screen independent buffer to capture standard output. 2. Normal display for sighted co-workers. 3. Audio feedback accompanying displayed text or error conditions. 4. Reading the text at interrupt level while application programs monopolize the CPU. 5. User defined pronunciations for words or symbols. 6. User defined key/command correspondence maps. Subsequent sections describe these features in detail. 2. ARCHITECTURE 2.1 Method And Machine Most talking programs, by virtue of being programs, perform one function, be it terminal emulation, word processing, or file management. A talking "computer" must be much more flexible. This requirement implies changes to the resident operating system. On the PC6300, this amounts to replacing the keyboard/screen device driver, a relatively simple task. This, along with low cost, is a strong argument for the PC6300. Smaller machines (e.g. Apple2E) possess inflexible ROM resident operating systems, making major modifications difficult. Memory constraints are also a factor here. More powerful micros (e.g. Unix based) make device driver modifications difficult due to the complexity of the operating system. Other important features of the PC6300 include a speaker for audio feedback, a keyboard with full ASCII and function keys, and an easily accessible interrupt system. Reading text at interrupt level enhances productivity considerably since the user can review output during program execution. 2.2 Speech Synthesizer Internal speech peripherals that consume memory and CPU time are not appropriate, since, by definition, other application programs must run on the PC6300 along with the synthesizing software. Instead, the speech unit should be a low cost, off-the-shelf device, that is easily attached to the micro computer via the serial or parallel port. The unit must convert an ASCII stream into analogue speech, and provide appropriate control and feedback. The speech unit should possess the following features: @ Costs less than $500. @ Wide range of speaking rates. @ Appropriate x-on x-off or RS232 flow control. @ A flush (shut up) command to terminate speech and clear internal buffers. @ A small phase delay between the incoming ASCII stream and the actual speech signal. Many underestimate the importance of the last criterion. Extensive buffering (as in the Echo speech peripheral) is unacceptable. The micro computer's internal "cursor" should track the actual speech as accurately as possible. Using the Votrax causes the cursor to lead the speech by about fifteen words, but this will have to do. Surprisingly, speech quality is relatively unimportant; the user quickly adapts to the specific speech synthesizer. Cutting costs is usually more important. Although the Votrax Type N Talk incorporates relatively old technology, it is still the best synthesizer for this application. 2.3 Screen Oriented World The number of screen oriented application programs is monotonically increasing. This device driver makes no concession to these programs; in fact, it actively opposes them. Function and control keys have been redefined, in order to simplify reading operations by providing single key-stroke commands. The visible cursor and the internal cursor (where the text is read) are completely independent. Any cursor control escape sequences produced by application software will be read literally, distracting the user. These line oriented constraints were not introduced casually, but after much thought and study. Supplementing a screen oriented program with speech is irreparably inefficient. One can never reproduce the benefits of a two dimensional visual search and scan. Instead, one is left with only the inconveniences. I have implemented talking screen oriented editors in the past, and the efficiency doesn't begin to compare with line oriented editors. The cost of hearing each letter or word that the cursor passes over is just too high. To illustrate, cut a small whole in a large sheet of paper, and hold it in front of your terminal. The whole should be the size of a 5 letter word on the screen. Now try running a visual editor, tracking the cursor with the whole in your sheet of paper. Even simple edits become slow and error prone. Therefore, all application programs are assumed to be line oriented. For this reason, this system may not be optimal for partially sighted workers, who may prefer a magnified screen supplemented with speech. Never try to be all things to all people. 2.4 Audio Feedback Audio feedback is an important feature that differentiates this system from commercial talking terminals. When the PC6300 displays any characters, the system simulates the sounds of a 1500 baud printer. Like most printers, whitespace is silent and carriage return generates a unique sound. With this feedback, a programmer always knows when the computer is producing output. Typically, commercial speech terminals run in one of two modes. In the polling mode, no audible indication accompanies generated text. This forces the user to constantly ask the terminal, via keyboard driven speech directives, whether additional data has been displayed. This is inefficient and frustrating. It is easy to miss unexpected messages, such as "system going down in 3 minutes, save all files now!!" In the second mode, all text is read automatically. Here too, the correlation in time is lost. A computer can generate screen after screen of text while the speech synthesizer translates the first line. In addition, users rarely want to read everything, word for word, and the blind user should not be forced to wade through a deluge of data to determine whether any unexpected messages were generated. Since speech, because of its speed, cannot provide timely feedback, simulating the sounds of a printer via an audio device is essential. These sounds provide information as well. Since characters, spaces, and carriage returns generate different sounds, the user often receives considerable information about the text being displayed. Solid lines, English text, assembly language, high level language, tables, and blank lines all sound different. Some common messages can be inferred from the sound patterns produced, eliminating the need to read the text, and improving productivity. Along with printer simulation, The PC6300 produces several other sounds, usually associated with error conditions. These sounds include: 1. A long tone indicating an active or enabled mode. 2. A short beep indicating a command error. This sound also accompanies control-G (industry standard). 3. A low buzz indicating a faulty or missing RS232 connection, or an inactive speech unit. 4. A fast sequence of high notes indicating a boundary condition, such as reading beyond the internal buffer. 3. EQUIPMENT AND SETUP Along with the AT&T PC6300, the system requires a Votrax Type N Talk unit, an RS232 serial cable, and a speaker with mini phone jack. The cable accompanying the Votrax unit cannot be used as is, since the PC6300 has a non-standard RS232 pinout. Other IBM compatibles tolerate standard RS232 cables. Unfortunately, PC6300 users must construct a new cable. The following connections are required: Votrax PC6300 male female 1 <-> 1 7 <-> 7 2 <-> 3 3 <-> 2 4 <-> 4 5 <-> 5 20 <-> 6 8 <-> 23 To configure the system, connect the PC6300 to the Votrax unit via the constructed RS232 cable, and set the Votrax baud rate to 9600 baud. This is done via dip switches on the back of the unit. Only the switch nearest the speaker jack should be down. Finally, connect the speaker and power supply to the Type N Talk unit. To use the system, simply insert a disc containing the speech software and turn on the Votrax and PC6300, and MS-DOS will automatically incorporate the talking device driver. The entry in config.sys specifies the size of the virtual screen. The line "DEVICE=talkcon.dev 7" causes the device driver to allocate 7K for its internal buffer. This is consistent with most ramdisc device drivers. 4. DEVICE DRIVER COMMANDS The user reads the accumulated text by entering various control characters and function keys. Associating appropriate commands to these keys is a significant human factors problem. Some key assignments are user friendly; others are disastrous. By default, function keys control reading, allowing a user to read text or programs with single key strokes. Home key control characters examine individual letters and words. The user can directly verify text as it is being entered without abandoning the home keys. Inconvenient control characters activate features that are rarely used. The system doesn't interpret any <alt> keys, since they are difficult to access quickly. The key assignments are table driven, and easily modified. The file "talkcon.sys", described in a later section, contains the key/command map. The effect of each control character and function key is explained below. In this section, the symbol '^' indicates a control character, while F1 through F10 represent the ten function keys and #0 through #9 represent the keys comprising the numeric keypad. The term "cursor" always refers to the internal cursor, where text is read. F1: Positions the cursor at the start of the internal buffer. This buffer is circular, and it "scrolls", like a large character oriented screen. F2: Moves up to the previous line and starts reading. F3: Positions the cursor at the end of the internal buffer. F4: Moves the cursor to the beginning of the current line and starts reading. F5: Reads the last complete line in the buffer. This allows the user to skip blank lines and the prompt (if any), and read the output from the previous command directly. F6: Advances the cursor to the next line and starts reading. F7: Clears the internal buffer. F8: Moves the cursor down two lines and starts reading. F9: Toggles the control character buffering mode. When enabled, control characters in standard out are placed in the internal buffer along with the text. By default, control characters fall into the bit bucket. Newline and bell are always placed in the buffer regardless of this parameter. F10: Toggles the 1-line reading mode. When the mode is enabled, the system stops reading after each line; otherwise it reads to the end of the buffer. The user can always interrupt reading by entering any command in this list. ^P: Announces the function of the next key entered. This allows a new user to review the command keys in this list. ^S: Moves the cursor back one space and speaks the current character. ^D: Speaks the character that the cursor is currently on. ^F: Moves the cursor forward one space and speaks the current character. ^E: Moves the cursor up one row and speaks the current character. ^C: Moves the cursor down one row and speaks the current character. ^R: Indicates the case of the letter pointed to by the cursor, sounding the "enabled" tone if it is upper case. ^T: Speaks the word associated with the current character. This prevents phonetic ambiguity, enabling the user to differentiate letters easily. The NATO standard phonetic alphabet is used (see table I). ^J: Moves the cursor back one token and speaks the current token. A token is a sequence of letters or digits, or a punctuation mark. ^K: Speaks the token that the cursor is currently on. ^L: Moves the cursor forward one token and speaks the current token. ^W: Gives the cursor's location by column number. When entering text, the sequence ^V ^W announces the current column (useful for Fortran programming). ^Q: takes the next character entered and passes it to MS-DOS directly. This feature is used to send control characters (e.g. ^S, ^Q, ^C) to the operating system. ^V: Same as F3. ^O: Same as F4. ^N: Same as F8. ^B: Same as F5. #0: Toggles the transparent mode. When enabled, the talking device driver is transparent, passing control and function keys to MS-DOS directly. This allows a sighted co-worker to run visual editors (whatever) without rebooting. The sundry sounds that usually accompany output are suppressed. In short, the new talking device driver emulates the original MS-DOS console device driver. #1: Same as ^S. #2: Same as ^D. #3: Same as ^F. #4: Same as ^E. #6: Same as ^C. #7: Same as ^J. #8: Same as ^K. #9: Same as ^L. TABLE I Phonetic Alphabet alpha hotel oscar uniform bravo india papa victor charlie juliet quebec wiskey delta kilo romeo x-ray echo lima sierra yankee foxtrot mike tango zulu golf november 5. PRONUNCIATION TABLE The user can direct the PC6300 to deliberately misspell words, so they will be pronounced correctly. The table containing these substitutions is kept in memory. This table also contains user defined pronunciations for each punctuation mark. The program "tcset" reads an ASCII file containing word and punctuation pronunciations, and constructs these tables for the talking device driver. The autoexec.bat script should execute this program to initialize the device driver's tables. Of course, the program can be run again at any time. The tcset program reads the ASCII file "talkcon.sys" to obtain the user defined pronunciations. This text file is line oriented, and can be modified using your favorite edittor. The syntax of each entry is: "old word", whitespace, "substituted text". The substituted text consists of letters, numbers, or spaces. If a line in the table contains "read reed", the PC6300 replaces the word "READ" with the word "REED" in the speech stream. The line "% percent" determines the word used for the symbol '%'. Lines beginning with whitespace hold comments. The substitution table in the device driver is limited to 2K, so don't expect to correct every mispronunciation under the sun. The software understands a few simple suffixes, such as regular plurals. If talkcon.sys contains "read reed", the words "reads" and "reading" will be modified accordingly. The software recognizes "s", "es", "ies" (plurals), "d", "ed", "ied" (past tense), and "ing" (participle). The file talkcon.sys may also contain key/command assignments to map particular functions to different keys. Again, entries are line oriented, and they are of the form "key = command-number." Keys are specified using the notation in the previous section (e.g. ^V, F3, #4). Available commands and their corresponding numbers are documented in the example talkcon.sys file provided with this software package. 6. ACRONYMS While reading, the device driver expands (apparently) unpronounceable words into their constituent letters. Thus, many acronyms and obscure variable names will be spelled out. The pronounceability test is quite simplistic, examining only the first four letters of each word. If these letters are all vowels, or all consonants, the word is spelled. If two or three vowels are present, the word is pronounced. When exactly one vowel is present, the word is spelled, unless the consonant cluster matches a predefined English cluster (table lookup). This simple algorithm usually works well. As always, the user can place specific variables or acronyms in the replacement table. 7. TERMINAL EMULATION In theory, any terminal emulator can be run unmodified, transforming the talking PC6300 into a talking terminal. Unfortunately, most terminal programs monopolize the function keys. Furthermore, they often provide cursor control, paging, and many other unwanted visual features. A simple, no frills terminal emulator that avoids function and control keys would improve productivity considerably. Such a program has been written, and is included in this software package. When running, it simply shuffles characters from stdin to the serial port, and from the serial port to stdout. Since interrupt routines control serial I/O, characters are not lost while the device driver reads the accumulated text. X-on / X-off flow control is implemented in both directions. Alt keys activate a few simple features. alt-X: Exit the terminal program and return to MS-DOS. Data terminal ready is disabled, equivalent to hanging up. alt-L: Leave the terminal emulator temporarily, and return to MS- DOS. Data terminal ready remains active; the user can return to the terminal session at any time. alt-B: Send a break. alt-S: Display modem status. The characters A, C, and S represent active (data set ready), carrier detect, and clear to send respectively. alt-R: Toggle the baud rate. The serial I/O data rate toggles between 1200 baud and 300 baud. Since the talking console device driver often runs with interrupts disabled for several milliseconds, higher baud rates are not supported. Except for file transfers, a higher baud rate would not improve the productivity of the blind worker. alt-D: Download a file. Characters from the serial port are redirected into the named file. The path from stdin to the serial port is unaffected. When the emulator receives ^Z (MS-DOS EOF), it closes the file and sends subsequent characters to stdout as before. The following sequence can be used to download a text file from a Unix machine: 1. Enter "stty -echo tab0". 2. Hit alt-D, followed by the file name. 3. Enter "cat file ; echo '\032'". 4. Watch the progress display (one '.' per kilobyte transferred), and wait for the Unix prompt. 5. Reset the stty parameters. alt-U: Upload a file. Alt-U followed by a file name sends characters from the named file to the serial port. The path from the serial port to stdout is unaffected. Characters entered at the keyboard are discarded, although the alt keys in this list are still interpreted. If a disaster occurs, the user can always exit the terminal program using alt-X. As before, echoing and tab expansion should be disabled. Use the Unix command "cat >file" to capture the uploaded text. When the file is transferred (indicated by a carriage return), enter ^D (Unix EOF) to close the Unix file. Industry standard file transfer mechanisms (e.g. ctrm) might be preferable. They are less flexible (not every host machine is so equipped), but they detect and correct errors, and are more robust. 8. SOFTWARE The software is written in Microsoft assembly, version 3.0 or above. To build the driver, assemble the four source files, link the resulting object files, and run exe2bin to produce the device driver. Talkcon.obj must be loaded first. No external library or startup routines are required. The programs tcset.c and savebuf.c are written in Microsoft C, version 4.0 or above. The software package consists of the following sourcefiles: MKTALK.BAT: Batch script to build the device driver. TALKCON.ASM: Device driver interface functions for MS-DOS. EVENTS.ASM: Routines that process speech commands at keyboard interrupt level. READING.ASM: Routines that control continuous reading at real time interrupt level. SYNTH.ASM: Interface functions that control the specific speech synthesizer. PARMS.H: Header file containing parameters for the talking device driver. TCSET.C: Program that reads an ASCII file of pronunciation corrections, and constructs the corresponding device driver tables. TALKCON.SYS: Ascii file containing the pronunciation corrections and the key/command map. SAVEBUF.C: Program that takes the accumulated output in the device driver's buffer, and stores it in a text file. TERMINAL.ASM: Simple terminal emulator. 9. CAVEATS @ Some application programs bypass the device driver, displaying output via the BIOS routines, or writing directly into screen memory. There is no way to read output produced in this manner. @ The "transparent mode" command cannot be reassigned to another key. However, another function, including the nul function, can be assigned to #0, eliminating the "transparent" option. @ For some reason, #5 is not easily accessible. Therefore, commands cannot be assigned to #5. @ When the internal cursor is positioned at the top of the circular buffer, the scrolling buffer drags the cursor along as the PC6300 generates additional text. Thus, the cursor remains at the top of the buffer. Since scrolling rates exceed human speech, reading text at the top of the buffer while the buffer scrolls can be quite an interesting experience. @ A few important real time functions are controlled by CPU loops rather than timer interrupts. If this device driver is ported to another IBM PC compatible with a different clock rate, the "CLKRATE" macro in parms.h must be redefined before building the software. @ It is remotely possible to encounter a dangerous race condition if text is being read while the "tcset" program modifies the pronunciation tables. @ Since the MS-DOS keyboard buffer is frustratingly small (15 characters), this device driver contains its own interrupt level type ahead buffer. The KBSIZ parameter in parms.h determines the size of this buffer (currently 120 characters). This allows function keys to remain operational when the type ahead buffer is full. @ The beeps, clicks, and constant chatter may drive your friends crazy. -- You know ... if it ain't patina, it's verdigris. Karl Dahlke ihnp4!ihnet!eklhad