[net.sources] Talking PC6300 For The Blind, part 1 of 11

eklhad@ihuxv.UUCP (02/11/87)
<>
         A Talking Console Device Driver
         For The AT&T PC6300

                             ABSTRACT

A new console device driver gives the  AT&T  PC6300  the  power  of
speech,   allowing   blind   workers  to  use  the  micro  computer
effectively.   This  device  driver  takes  all   standard   output
generated by the PC6300, redirects it  into an internal buffer, and
enables the blind worker to read  the  text  in  an  efficient  and
productive manner.  Arbitrary line oriented applications can be run
without  modifications.   Using  traditional  software   generation
tools,  the  blind  worker  can  develop  personalized software, or
modify the talking device driver itself.  The  software  expects  a
Votrax  Type  N Talk speech unit attached to the serial port, but a
modular design allows other synthesizers to be substituted  without
much  reprogramming.   The  source  is in the public domain, and is
available on floppy disks or via electronic mail.


1.  INTRODUCTION

Since  computers  have  become  an  indispensable  tool   in   many
professions,   the   blind   worker's  productivity  often  depends
critically on an efficient, user friendly human-machine  interface.
This  becomes  even more important as CD-ROM peripherals enable the
micro computer to act as a reference library.   Already,  a  PC6300
can  be equipped with an electronic version of the encyclopedia for
only $1,000.  For the first time, the  blind  researcher  may  have
access to inexpensive electronic information.

While a talking terminal provides an interface to any computer, the
combination is unnecessarily complex, and many cannot afford to buy
or rent both the talking  terminal  and  the  target  computer.   A
general purpose talking micro computer would allow blind workers to
exploit the capabilities latent in every micro computer,  including
terminal emulation if desired.  A new talking device driver for the
PC6300 brings this goal within easy reach.

The software has the following features:

  1.  A screen independent buffer to capture standard output.

  2.  Normal display for sighted co-workers.

  3.  Audio  feedback  accompanying   displayed   text   or   error
      conditions.

  4.  Reading  the  text  at  interrupt  level  while   application
      programs monopolize the CPU.

  5.  User defined pronunciations for words or symbols.

  6.  User defined key/command correspondence maps.

Subsequent sections describe these features in detail.


2.  ARCHITECTURE

2.1  Method And Machine

Most talking programs, by virtue of  being  programs,  perform  one
function,  be  it  terminal  emulation,  word  processing,  or file
management.  A talking "computer" must be much more flexible.  This
requirement  implies  changes to the resident operating system.  On
the PC6300, this amounts to replacing  the  keyboard/screen  device
driver,  a relatively simple task.  This, along with low cost, is a
strong argument for the PC6300.  Smaller  machines  (e.g.  Apple2E)
possess  inflexible  ROM  resident  operating systems, making major
modifications difficult.  Memory  constraints  are  also  a  factor
here.   More  powerful  micros (e.g. Unix based) make device driver
modifications difficult due to  the  complexity  of  the  operating
system.

Other important features of the PC6300 include a speaker for  audio
feedback,  a  keyboard  with  full  ASCII and function keys, and an
easily accessible interrupt  system.   Reading  text  at  interrupt
level  enhances productivity considerably since the user can review
output during program execution.

2.2  Speech Synthesizer

Internal speech peripherals that consume memory and  CPU  time  are
not  appropriate,  since, by definition, other application programs
must run on  the  PC6300  along  with  the  synthesizing  software.
Instead,  the  speech  unit  should  be  a  low cost, off-the-shelf
device, that is easily attached  to  the  micro  computer  via  the
serial  or  parallel  port.   The unit must convert an ASCII stream
into analogue speech, and provide appropriate control and feedback.
The speech unit should possess the following features:

   @ Costs less than $500.

   @ Wide range of speaking rates.

   @ Appropriate x-on x-off or RS232 flow control.

   @ A flush (shut  up)  command  to  terminate  speech  and  clear
     internal buffers.

   @ A small phase delay between the incoming ASCII stream and  the
     actual speech signal.

Many underestimate the importance of the last criterion.  Extensive
buffering  (as in the Echo speech peripheral) is unacceptable.  The
micro computer's internal "cursor" should track the  actual  speech
as  accurately  as possible.  Using the Votrax causes the cursor to
lead the speech by about fifteen words, but this will have to do.

Surprisingly, speech quality is relatively  unimportant;  the  user
quickly  adapts  to the specific speech synthesizer.  Cutting costs
is usually  more  important.   Although  the  Votrax  Type  N  Talk
incorporates  relatively  old  technology,  it  is  still  the best
synthesizer for this application.

2.3  Screen Oriented World

The number of screen oriented application programs is monotonically
increasing.   This  device  driver  makes  no  concession  to these
programs; in fact, it actively opposes them.  Function and  control
keys  have  been redefined, in order to simplify reading operations
by providing single key-stroke commands.  The  visible  cursor  and
the  internal  cursor  (where  the  text  is  read)  are completely
independent.  Any  cursor  control  escape  sequences  produced  by
 application software will be read literally, distracting the user.

 These line oriented constraints were not introduced  casually,  but
 after  much  thought  and  study.   Supplementing a screen oriented
 program with speech is  irreparably  inefficient.   One  can  never
 reproduce the benefits of a two dimensional visual search and scan.
 Instead,  one  is  left  with  only  the  inconveniences.   I  have
 implemented  talking  screen  oriented editors in the past, and the
 efficiency doesn't begin to compare  with  line  oriented  editors.
 The cost of hearing each letter or word that the cursor passes over
 is just too high.  To illustrate, cut a  small  whole  in  a  large
 sheet  of  paper, and hold it in front of your terminal.  The whole
 should be the size of a 5 letter  word  on  the  screen.   Now  try
 running a visual editor, tracking the cursor with the whole in your
 sheet of paper.  Even simple edits become  slow  and  error  prone.
 Therefore,   all  application  programs  are  assumed  to  be  line
 oriented.  For this reason, this system  may  not  be  optimal  for
 partially  sighted  workers,  who  may  prefer  a  magnified screen
 supplemented with speech.  Never  try  to  be  all  things  to  all
 people.

 2.4  Audio Feedback

 Audio feedback is an important  feature  that  differentiates  this
 system from commercial talking terminals.  When the PC6300 displays
 any characters, the system simulates the  sounds  of  a  1500  baud
 printer.   Like  most  printers,  whitespace is silent and carriage
 return generates a unique sound.  With this feedback, a  programmer
 always knows when the computer is producing output.

 Typically, commercial speech terminals run in one of two modes.  In
 the polling mode, no audible indication accompanies generated text.
 This forces the user to constantly ask the terminal,  via  keyboard
 driven   speech   directives,  whether  additional  data  has  been
 displayed.  This is inefficient and frustrating.   It  is  easy  to
 miss  unexpected messages, such as "system going down in 3 minutes,
 save all files  now!!"  In  the  second  mode,  all  text  is  read
 automatically.   Here  too,  the  correlation  in  time is lost.  A
 computer can generate screen after screen of text while the  speech
 synthesizer  translates  the first line.  In addition, users rarely
 want to read everything, word for word, and the blind  user  should
 not be forced to wade through a deluge of data to determine whether
 any unexpected messages were generated.  Since speech,  because  of
 its speed, cannot provide timely feedback, simulating the sounds of
 a printer via an audio device is essential.

 These  sounds  provide  information  as  well.   Since  characters,
 spaces,  and  carriage  returns generate different sounds, the user
 often  receives  considerable  information  about  the  text  being
 displayed.   Solid  lines,  English  text,  assembly language, high
 level language, tables, and blank lines all sound different.   Some
 common  messages  can be inferred from the sound patterns produced,
 eliminating the need to read the text, and improving productivity.

 Along with printer simulation, The PC6300  produces  several  other
 sounds,  usually  associated  with  error conditions.  These sounds
 include:

   1.  A long tone indicating an active or enabled mode.

   2.  A short beep indicating a command  error.   This  sound  also
       accompanies control-G (industry standard).

   3.  A low buzz indicating a faulty or missing  RS232  connection,
       or an inactive speech unit.

   4.  A  fast  sequence  of  high  notes  indicating   a   boundary
       condition, such as reading beyond the internal buffer.


 3.  EQUIPMENT AND SETUP

 Along with the AT&T PC6300, the system requires  a  Votrax  Type  N
 Talk  unit,  an  RS232  serial cable, and a speaker with mini phone
 jack.

 The cable accompanying the Votrax unit cannot be used as is,  since
 the  PC6300 has a non-standard RS232 pinout.  Other IBM compatibles
 tolerate standard RS232 cables.  Unfortunately, PC6300  users  must
 construct a new cable.  The following connections are required:

 Votrax          PC6300
 male            female
 1       <->     1
 7       <->     7
 2       <->     3
 3       <->     2
 4       <->     4
 5       <->     5
 20      <->     6
 8       <->     23

 To configure the system, connect the PC6300 to the Votrax unit  via
 the  constructed  RS232 cable, and set the Votrax baud rate to 9600
 baud.  This is done via dip switches on the back of the unit.  Only
 the  switch  nearest  the  speaker  jack  should be down.  Finally,
 connect the speaker and power supply to the Type N Talk  unit.   To
 use the system, simply insert a disc containing the speech software
 and turn on the Votrax and PC6300, and  MS-DOS  will  automatically
 incorporate  the  talking  device  driver.  The entry in config.sys
 specifies   the   size   of   the   virtual   screen.    The   line
 "DEVICE=talkcon.dev 7"  causes the device driver to allocate 7K for
 its internal buffer.  This is consistent with most  ramdisc  device
 drivers.


 4.  DEVICE DRIVER COMMANDS

 The user reads the accumulated text  by  entering  various  control
 characters  and function keys.  Associating appropriate commands to
 these keys is  a  significant  human  factors  problem.   Some  key
 assignments are user friendly; others are disastrous.

 By default, function keys control reading, allowing a user to  read
 text  or  programs  with  single  key  strokes.   Home  key control
 characters examine individual letters  and  words.   The  user  can
 directly  verify text as it is being entered without abandoning the
 home keys.  Inconvenient control characters activate features  that
 are  rarely  used.   The  system  doesn't interpret any <alt> keys,
 since they are difficult to access quickly.   The  key  assignments
 are  table  driven,  and  easily modified.  The file "talkcon.sys",
 described in a later section, contains the key/command map.

 The effect of each control character and function key is  explained
 below.   In  this  section,  the  symbol  '^'  indicates  a control
 character, while F1 through F10 represent the ten function keys and
 #0  through  #9  represent  the keys comprising the numeric keypad.
 The term "cursor" always refers to the internal cursor, where  text
 is read.

 F1:  Positions the cursor at the  start  of  the  internal  buffer.
      This  buffer  is  circular,  and  it  "scrolls",  like a large
      character oriented screen.

 F2:  Moves up to the previous line and starts reading.

 F3:  Positions the cursor at the end of the internal buffer.

 F4:  Moves the cursor to the beginning  of  the  current  line  and
      starts reading.

 F5:  Reads the last complete line in the buffer.  This  allows  the
      user to skip blank lines and the prompt (if any), and read the
      output from the previous command directly.

 F6:  Advances the cursor to the next line and starts reading.

 F7:  Clears the internal buffer.

 F8:  Moves the cursor down two lines and starts reading.

 F9:  Toggles the control character buffering mode.   When  enabled,
      control  characters in standard out are placed in the internal
      buffer along with the text.  By  default,  control  characters
      fall  into the bit bucket.  Newline and bell are always placed
      in the buffer regardless of this parameter.

 F10: Toggles the 1-line reading mode.  When the  mode  is  enabled,
      the  system  stops reading after each line; otherwise it reads
      to the end of the  buffer.   The  user  can  always  interrupt
      reading by entering any command in this list.

 ^P:  Announces the function of the next key entered.  This allows a
      new user to review the command keys in this list.

 ^S:  Moves the  cursor  back  one  space  and  speaks  the  current
      character.

 ^D:  Speaks the character that the cursor is currently on.

 ^F:  Moves the cursor forward one  space  and  speaks  the  current
      character.

 ^E:  Moves the cursor up one row and speaks the current character.

 ^C:  Moves  the  cursor  down  one  row  and  speaks  the   current
      character.

 ^R:  Indicates the case of the letter pointed  to  by  the  cursor,
      sounding the "enabled" tone if it is upper case.

 ^T:  Speaks the word associated with the current  character.   This
      prevents    phonetic   ambiguity,   enabling   the   user   to
      differentiate letters  easily.   The  NATO  standard  phonetic
      alphabet is used (see table I).

 ^J:  Moves the cursor back one token and speaks the current  token.
      A  token  is a sequence of letters or digits, or a punctuation
      mark.

 ^K:  Speaks the token that the cursor is currently on.

 ^L:  Moves the cursor forward one  token  and  speaks  the  current
      token.

 ^W:  Gives the cursor's location by column number.   When  entering
      text, the sequence ^V ^W  announces the current column (useful
      for Fortran programming).

 ^Q:  takes the next character  entered  and  passes  it  to  MS-DOS
      directly.   This  feature  is  used to send control characters
      (e.g. ^S, ^Q, ^C) to the operating system.

 ^V:  Same as F3.

 ^O:  Same as F4.

 ^N:  Same as F8.

 ^B:  Same as F5.

 #0:  Toggles the  transparent  mode.   When  enabled,  the  talking
      device  driver  is  transparent,  passing control and function
      keys to MS-DOS directly.  This allows a sighted  co-worker  to
      run  visual  editors (whatever) without rebooting.  The sundry
      sounds that  usually  accompany  output  are  suppressed.   In
      short,  the  new  talking  device driver emulates the original
      MS-DOS console device driver.

 #1:  Same as ^S.

 #2:  Same as ^D.

 #3:  Same as ^F.

 #4:  Same as ^E.

 #6:  Same as ^C.

 #7:  Same as ^J.

 #8:  Same as ^K.

 #9:  Same as ^L.

                               TABLE I
                          Phonetic Alphabet

 alpha     hotel     oscar     uniform
 bravo     india     papa      victor
 charlie   juliet    quebec    wiskey
 delta     kilo      romeo     x-ray
 echo      lima      sierra    yankee
 foxtrot   mike      tango     zulu
 golf      november


 5.  PRONUNCIATION TABLE

 The user can direct the PC6300 to deliberately misspell  words,  so
 they  will  be  pronounced  correctly.   The table containing these
 substitutions is kept in memory.  This  table  also  contains  user
 defined  pronunciations  for  each  punctuation  mark.  The program
 "tcset"  reads  an  ASCII  file  containing  word  and  punctuation
 pronunciations,  and constructs these tables for the talking device
 driver.  The autoexec.bat script should  execute  this  program  to
 initialize  the device driver's tables.  Of course, the program can
 be run again at any time.

 The tcset program reads the ASCII file "talkcon.sys" to obtain  the
 user  defined pronunciations.  This text file is line oriented, and
 can be modified using your favorite edittor.  The  syntax  of  each
 entry   is:    "old word",   whitespace,  "substituted text".   The
 substituted text consists of letters, numbers,  or  spaces.   If  a
 line  in  the  table  contains "read reed", the PC6300 replaces the
 word "READ" with the word "REED" in the speech  stream.   The  line
 "% percent"  determines  the  word  used for the symbol '%'.  Lines
 beginning with whitespace hold comments.
 The substitution table in the device driver is limited  to  2K,  so
 don't  expect to correct every mispronunciation under the sun.  The
 software  understands  a  few  simple  suffixes,  such  as  regular
 plurals.   If  talkcon.sys  contains "read reed", the words "reads"
 and  "reading"  will  be  modified   accordingly.    The   software
 recognizes  "s",  "es",  "ies"  (plurals),  "d",  "ed", "ied" (past
 tense), and "ing" (participle).

 The file talkcon.sys may also contain  key/command  assignments  to
 map  particular  functions  to  different keys.  Again, entries are
 line oriented, and they are  of  the  form  "key = command-number."
 Keys are specified using the notation in the previous section (e.g.
 ^V, F3, #4).  Available commands and  their  corresponding  numbers
 are  documented  in the example talkcon.sys file provided with this
 software package.


 6.  ACRONYMS

 While   reading,   the   device   driver    expands    (apparently)
 unpronounceable  words  into their constituent letters.  Thus, many
 acronyms and obscure variable  names  will  be  spelled  out.   The
 pronounceability test is quite simplistic, examining only the first
 four letters of each word.  If these letters are all vowels, or all
 consonants,  the  word  is  spelled.   If  two  or three vowels are
 present, the  word  is  pronounced.   When  exactly  one  vowel  is
 present,  the word is spelled, unless the consonant cluster matches
 a predefined English cluster (table lookup).  This simple algorithm
 usually  works  well.   As  always,  the  user  can  place specific
 variables or acronyms in the replacement table.


 7.  TERMINAL EMULATION

 In  theory,  any  terminal  emulator   can   be   run   unmodified,
 transforming   the   talking   PC6300   into  a  talking  terminal.
 Unfortunately, most terminal programs monopolize the function keys.
 Furthermore,  they  often  provide cursor control, paging, and many
 other unwanted visual  features.   A  simple,  no  frills  terminal
 emulator  that  avoids  function  and  control  keys  would improve
 productivity considerably.  Such a program has been written, and is
 included  in  this  software  package.   When  running,  it  simply
 shuffles characters from stdin to the serial  port,  and  from  the
 serial  port  to  stdout.   Since interrupt routines control serial
 I/O, characters are not lost while  the  device  driver  reads  the
 accumulated text.  X-on / X-off flow control is implemented in both
 directions.  Alt keys activate a few simple features.

 alt-X: Exit the  terminal  program  and  return  to  MS-DOS.   Data
        terminal ready is disabled, equivalent to hanging up.

 alt-L: Leave the terminal emulator temporarily, and return  to  MS-
        DOS.   Data  terminal  ready  remains  active;  the user can
        return to the terminal session at any time.

 alt-B: Send a break.

 alt-S: Display modem status.  The characters A, C, and S  represent
        active  (data  set ready), carrier detect, and clear to send
        respectively.

 alt-R: Toggle the baud rate.  The  serial  I/O  data  rate  toggles
        between  1200  baud and 300 baud.  Since the talking console
        device  driver  often  runs  with  interrupts  disabled  for
        several  milliseconds,  higher baud rates are not supported.
        Except for file transfers, a  higher  baud  rate  would  not
        improve the productivity of the blind worker.

 alt-D: Download a  file.   Characters  from  the  serial  port  are
        redirected  into the named file.  The path from stdin to the
        serial port is unaffected.  When the  emulator  receives  ^Z
        (MS-DOS  EOF),  it  closes  the  file  and  sends subsequent
        characters to stdout as before.  The following sequence  can
        be used to download a text file from a Unix machine:

          1.  Enter "stty -echo tab0".

          2.  Hit alt-D, followed by the file name.

          3.  Enter "cat file ; echo '\032'".

          4.  Watch the  progress  display  (one  '.'  per  kilobyte
              transferred), and wait for the Unix prompt.

          5.  Reset the stty parameters.

 alt-U: Upload  a  file.   Alt-U  followed  by  a  file  name  sends
        characters from the named file to the serial port.  The path
        from the serial port to stdout  is  unaffected.   Characters
        entered at the keyboard are discarded, although the alt keys
        in this list are still interpreted.  If a  disaster  occurs,
        the  user  can always exit the terminal program using alt-X.
        As before, echoing and tab  expansion  should  be  disabled.
        Use  the  Unix  command  "cat >file" to capture the uploaded
        text.  When the file is transferred (indicated by a carriage
        return), enter ^D (Unix EOF) to close the Unix file.

 Industry standard file transfer mechanisms  (e.g.  ctrm)  might  be
 preferable.   They  are less flexible (not every host machine is so
 equipped), but they detect and correct errors, and are more robust.


 8.  SOFTWARE

 The software is written  in  Microsoft  assembly,  version  3.0  or
 above.   To  build the driver, assemble the four source files, link
 the resulting object files, and run exe2bin to produce  the  device
 driver.   Talkcon.obj must be loaded first.  No external library or
 startup routines are required.  The programs tcset.c and  savebuf.c
 are written in Microsoft C, version 4.0 or above.

 The software package consists of the following sourcefiles:

 MKTALK.BAT:  Batch script to build the device driver.

 TALKCON.ASM: Device driver interface functions for MS-DOS.

 EVENTS.ASM:  Routines that  process  speech  commands  at  keyboard
              interrupt level.

 READING.ASM: Routines that control continuous reading at real  time
              interrupt level.

 SYNTH.ASM:   Interface functions that control the  specific  speech
              synthesizer.

 PARMS.H:     Header file  containing  parameters  for  the  talking
              device driver.

 TCSET.C:     Program that reads  an  ASCII  file  of  pronunciation
              corrections,  and  constructs the corresponding device
              driver tables.

 TALKCON.SYS: Ascii file containing  the  pronunciation  corrections
              and the key/command map.

 SAVEBUF.C:   Program that  takes  the  accumulated  output  in  the
              device driver's buffer, and stores it in a text file.

 TERMINAL.ASM: Simple terminal emulator.


 9.  CAVEATS

    @ Some application programs bypass the device driver, displaying
      output  via the BIOS routines, or writing directly into screen
      memory.  There is no way  to  read  output  produced  in  this
      manner.

    @ The "transparent mode" command cannot be reassigned to another
      key.   However,  another function, including the nul function,
      can be assigned to #0, eliminating the "transparent" option.

    @ For some reason, #5  is  not  easily  accessible.   Therefore,
      commands cannot be assigned to #5.

    @ When the internal cursor is  positioned  at  the  top  of  the
      circular  buffer,  the scrolling buffer drags the cursor along
      as the PC6300 generates additional  text.   Thus,  the  cursor
      remains  at  the  top  of  the  buffer.  Since scrolling rates
      exceed human speech, reading text at the  top  of  the  buffer
      while   the   buffer  scrolls  can  be  quite  an  interesting
      experience.

    @ A few important real time  functions  are  controlled  by  CPU
      loops  rather than timer interrupts.  If this device driver is
      ported to another IBM PC compatible  with  a  different  clock
      rate,  the "CLKRATE" macro in parms.h must be redefined before
      building the software.

    @ It  is  remotely  possible  to  encounter  a  dangerous   race
      condition  if  text  is  being  read while the "tcset" program
      modifies the pronunciation tables.

    @ Since the MS-DOS keyboard buffer is  frustratingly  small  (15
      characters),  this  device  driver  contains its own interrupt
      level type ahead  buffer.   The  KBSIZ  parameter  in  parms.h
      determines the size of this buffer (currently 120 characters).
      This allows function keys to remain operational when the  type
      ahead buffer is full.

    @ The beeps, clicks, and constant chatter may drive your friends
      crazy.

-- 
	You know  ...  if it ain't patina, it's verdigris.
	Karl Dahlke   ihnp4!ihnet!eklhad