[fa.editor-p] Voice driven editing

C70:editor-people (05/27/82)

>From C70:daemon  Thu May 27 02:21:35 1982
We are about to get a voice recognition system operational here
at Wharton, and one of things we are planning to put up is a
voice driven editor. 
Has anyone done anything in this field?  

Thanks in advance,

Henry Dreifus

C70:editor-people (06/04/82)

>From gaines@RAND-UNIX Fri Jun  4 00:30:54 1982
Henry,
  I've been waiting to see if you would get any response to your request
for information about voice-driven editing.  So far, I've seen no replies,
but if you have any that weren't circulated to the list, please forward
them to me.  I have been interested in the subject for some time, but have
done nothing other than think some about the problems.  I have not heard of
much that is happening, either.  My impression is that the speech
recognition people have not yet realized that this is a prime application
area for them, and are not sensitive to its advantages.

There are important elements present in voice-driven editing that are not
present in other speech recognition situations.  The user has a second
input device (the keyboard) available, and it is a feedback situation.  The
user can correct the errors of the speech recognizer.  There are two
important consequences which make this a good task for the study of speech
recognition.  One is that a continuous learning approach can be taken to
speech recognition, since feedback on errors will always be available.
Most speech recognition situations provide only an initial learning period.
The second is that the task can be divided between the keyboard and the
speech recognizer, so that speech input need not be used for everything
until it has advanced far enough.  We might, for example, use voice to
control the cursor and for commands, while continuing to type most words,
at least until word recognition gets much better than it is now.

Another avenue to be explored is stylized speach.  The hardest problem area
in speech recognition, as I understand it, is to recognize continuous
speech.  If there is even the slightest pause between words, recognition is
much easier.  While in many applications restrictions on the speaker would
be unacceptable, it might be acceptable to many when entering text, since
there would still be a substantial efficiency gain over other forms of text
entry.  Also, we could devise sounds quite different from english for
commands (a la Victor Borge!).  The speaker can become trained, as well as
the speech recognizer.

I discussed this recently with Bea Oshika at SDC, who has been active in
speech recognition for many years.  She pointed out some cognitive problems
with mixed mode input.  People, she claims, don't do well at talking and
carrying out manual tasks at the same time.  But I suspect that training in
voice + keyboard input to an editor could produce a more efficient result
for most people.  At least it is an interesting question to investigate.

Stock Gaines

C70:editor-people (06/05/82)

>From cbosgd!mark Sat Jun  5 09:12:57 1982
It seems to me that, if the technology can support it, voice driven editing
would be at its best in input mode.  People can talk much faster and with
less training (of the people) than they can type.  On the other hand, for
editing commands, I would think that voice would be a bottleneck.  Especially
for positioning the cursor.  You can't point with your voice.  Can you
imagine trying to position the cursor orally: "up up up left a word no,
make that right a word, right right ok".  Try moving your lips as fast as
you can type arrow keys.  There would be no comparison with a mouse.

On the other hand, you can point with your eyes really well.  If you had a
gizmo that could figure out where you are looking, you'd really have
something.  These gizmos do exist, but I understand you have to wear a
helmet with lots of stuff on it.

	Mark Horton