[mod.telecom] Voicemail info follow-up

minow@REX.DEC (Martin Minow, DECtalk Engineering ML3-1/U47 223-9922) (10/21/85)
<<moderator: please don't reformat the following>>

Notes on early voice processing systems.


                               Disclaimer

        Patent 4,371,752, filed Nov.  26, 1979, issued Feb.   1,
        1983,  (the  VMX  patent)  claims  to  cover  voice-mail
        systems.  The reader should not assume that  information
        in this note disputes those claims.

There are two main early research efforts in the voice-processing field:
the Arpa real-time voice project and the IBM Voice Filing System.  There
are also a number of smaller efforts.



1  THE ARPA REAL-TIME VOICE PROJECT

The ARPAnet is a digital packet-switched network that connects a number
of computers doing government (Defense Department) sponsored work.

In a report "Evolution of the ARPAnet", published in 1981 by E.  J.
Feinler of SRI, The network voice protocol is described as follows:

    "The Network Voice Protocol (NVP) was implemented in 1973 and has
    been in use since then for realtime voice communication over the
    ARPANET [Cohen, D.  Specifications for the Network Voice Protocol
    (NVP), RFC 741, NIC 42444, Nov.  22, 1977, pp 43-88 IN:  ARPANET
    Protocol Handbook, NIC 7104, Network Information Center, SRI
    International, Menlo Park CA, rev.  Jan 1978.].  The protocol was
    developed by a group headed by the University of Southern
    California, Informatin Sciences Institute (ISI), as part of ARPA's
    Network Secure Communications (NSC) project.  The goal of this
    project was to demonstrate a digital, high-quality, low-bandwidth,
    secure voice handling capability across the ARPANET.  The protocol
    has been used successfully for experiments between ISI, BBN, SRI,
    MIT'S Lincoln Laboratory (MIT-LL), Culler-Harrison, Incl, and the
    Speech Communications Research Lab, Inc."

Packetized voice was first tranmitted in 1974 with point-to-point
connections, and in 1975 with conference connections.  A prototype voice
message system was implemented at ISI in 1978.  This was integrated into
the user's work environment, rather than "just" a computer-based
answering machine.  I do not know whether the ISI voice message system
was integrated into the public telephone network.

The ARPA voice project is discussed in two papers:

    Cohen, D., "A voice message system," in R.  P.  Uhlig (ed.),
    Computer Message Systems, pp.  17-27, North-Holland, 1981.

    Gold, Bernard (invited paper), "Digital Speech Networks", Proc.
    IEEE Vol.  65, No.  12, Dec.  1977.




2  THE IBM VOICE FILING SYSTEM

(These notes are from a collegue's trip-report, dated Sep.  12, 1978).

At COMPCON 78 (September, 1978), Steve Boise, Manager of the Voice
Filing System project at IBM, Yorktown Heights, gave a presentation.
There are six people on the project.  it was started five years ago
(i.e.  in 1973).  Three of them are psychologists, three computer types.
They considered this the first step toward an integrated office
information system.  The project is aimed toward providing direct
support to office principals (i.e., not secretaries or other support
people).  (Note:  the COMPCON proceedings do not appear to have an
abstract or paper on the IBM system.)

Boise's project is an audio correspondence system.  "Correspondence"
refers to non-interactive communications, those not requiring people to
get together at the same time.

IBM has had a system in use, at an experimental level, for 2 1/2 years
(i.e., since 1976).  it uses a System 7 for real-time control, and a
370/168 as a time-shared host.  The main purpose of the 168 is for mass
storage.  They use 2 hours of CPU time per month.  There is 1 Mbyte of
"on line" storage, and 800 Mbytes in "MSS" (archival storage?).  Users
access the system by dialing in from any touch-tone phone.

Boise gave a demo of the actual system.  All control for the system is
by touch-tone.  Audio input is used only for message content.  The user
can originate messages, transmit them (using touch-tone keys to specify
addresses), listen to his own mail, and several other functions.

The system automatically eliminates any long pauses from messages.  This
has had the unanticipated benifit of practically eliminating "mike
fright".  Users don't have to worry about pausing when deciding what to
say.  The system also uses some other tricks to speed up playback
without altering voice quality.  Typically, 50 wpm recording becomes 150
wpm on playback.  Another unintended result is that recordings sound
much more as if the person knows what he is talking about.

You can record a message, and specify it to be delivered at some future
time.  The computer will call up the addressee and tell him about the
message.  It can try several different numbers, and will call back later
if no answer.  If you go away, you can leave a forwarding number.

Users can file mail if they desire.  Retrieval can be by originator,
dates, and classification -- all under touch-tone control.  Messages are
automatically erased from the mailbox after two weeks, if they have been
read at least once.  Users like this feature as it frees them from
having to worry about disposing of old mail.

File protection concepts are built in.  Every message has an owner.
Several levels of access are possible:  read-only, read and forward,
read, append, and forward.

There are also several "classifications":  unclassified, personal, and
confidential.

You can check if someone has read the mail you sent him.  Other status
information is also available, such as whether he has logged in today,
etc.  You can also record a message to be read to anyone who asks about
you.  So, for example, if you are out of town for a week, you can leave
a message saying so.

The system provides extensive editing facilities which are mostly unused
as the users think they are too complex.

The system is heavily instrumented.  The implementors know which
features are used, and how much.  They know every command that has been
given on the system (but not message content).

The real issue is building a good "principal interface".  You must make
the entry cost to the principal very low.  The system uses lots of
(audio) prompting an dmultiple-choice responses.

To start using the system, there are only seven touch-tone commands to
learn.  Commands use the touch-tone letters as mnemonics, e.g., *R means
"record".  There is a "help" facility.  The " " key, followed by any
other key tells what that key will do.

References for the IBM system include the following:

    Gould, J.  D., and Boies, S.  J.  "Speech filing -- an office system
    for Principals." IBM Systems Journal, Vol 23, No.  1, 1984.  pp.
    65-81.  (Also IBM Res.  REp.  RC-9769, Dec.  1982).

    Gould, J.  D., and Boies, S.  J.  "Human factors challenges in
    creating a principal support office system -- The Speech Filing
    System Approach." ACM Trans.  on Office Info.  Systems, Vol.  1, No.
    4, October 1983, pp.  273-298.

The following were referenced by the above papers.  I haven't seen them
at this time.

    Boies, S.  J.  "A computer based audio communication system," AIIIE
    Conference on Automating Business Communications, (January 23-25,
    1978), pp.  369-372.  (Paper can be obtained from Management
    Education Corporation (MEC), Box 3727, Santa Monica, CA 90403.)

    Zeheb, D.  and Boies, S.  J.  "Speech filing migration system," in
    H.  Inose (Editor), Proceedings of the International Conference of
    Computer Communication (September 1978), pp.  571-574.

    IBM Audio Distribution System Subscriber's Guide, SC34-0400-1, IBM
    Corporation, 4111 Northside Parkway N.W., Box 2150, Atlanta, GA
    30056; also available from IBM branch offices.




3  OTHER WORK (NOT NECESSARILY VOICE-MAIL)

A number of companies produced systems for audio-response applications
where a customer could retreive information stored on a computer by
using a Touch-tone (tm) telephone.  Survey articles were published in
Datamation (1969) and by Datapro (September 1976).  These systems used
prerecorded human speech to produce messages with limited content.  The
misdial message "the number you have dialed, 555-1212, is not in
service..." is produced by a similar system.

Delphi Communications (part of Exxon information systems) was founded to
do voice messaging.

Computalker Consultants (Santa Monica, CA) developed hardware for speech
synthesis (connected to microcomputers using the S100 bus architecture).
The Computalker CT1) could not be directly connected to the public
telephone network.

    Rice, D.  L.  "Friends, humans, and countryrobots:  lend me your
    ears", Byte, Number 12, August 1976.

    Rice, D.  L.  "Speech Synthesis by a set of rules (or can a set of
    rules speak English?)", Proceedings of the First West Coast Computer
    Faire, San Francisco, 1977.

    Rice, D.  L.  "Hardware and software for speech synthesis", Dr.
    Dobbs Journal, April 1976.

Votrax (Troy Michigan) developed hardware for phonemic synthesis that
could be connected to any computer that supported Ascii text (RS232
asychronous port) and could connect to a Bell 407 -- and hence to the
public telephone system.

Systems using the Votrax and Bell 407 were developed at Bell Labs by M.
D.  McIlroy to do unrestricted text-to-speech conversion.  This allowed
directory-assistance applicications to be implemented on a Unix (version
6) system.  The software was available under license from Bell
Laboratories in 1978 (or earlier).  By connecting the text-to-speech
software to to standard Unix utilities using the "pipe" mechanism, voice
mail and computer-generated broadcast messages ("Time for lunch!") could
be easily implemented.

Using the same hardware, Lauren Weinstein implemented a "Touch-tone
Unix" interface at UCLA.

Using this hardware, and suggestions from Lauren Weinstein, I
implemented a Touch-tone RSTS/E system at the Dec Research and
Development group.  It was shown publicly at Canada Decus, February
26-29, 1980.


Posted:	Mon 21-Oct-1985 16:53 Maynard Time.  Martin Minow MLO3-3/U8, DTN 223-9922
To:	RHEA::DECWRL::"human-nets@rutgers.arpa",RHEA::DECWRL::"telecom@mit-xx.arpa"