minow@REX.DEC (Martin Minow, DECtalk Engineering ML3-1/U47 223-9922) (10/21/85)
<<moderator: please don't reformat the following>> Notes on early voice processing systems. Disclaimer Patent 4,371,752, filed Nov. 26, 1979, issued Feb. 1, 1983, (the VMX patent) claims to cover voice-mail systems. The reader should not assume that information in this note disputes those claims. There are two main early research efforts in the voice-processing field: the Arpa real-time voice project and the IBM Voice Filing System. There are also a number of smaller efforts. 1 THE ARPA REAL-TIME VOICE PROJECT The ARPAnet is a digital packet-switched network that connects a number of computers doing government (Defense Department) sponsored work. In a report "Evolution of the ARPAnet", published in 1981 by E. J. Feinler of SRI, The network voice protocol is described as follows: "The Network Voice Protocol (NVP) was implemented in 1973 and has been in use since then for realtime voice communication over the ARPANET [Cohen, D. Specifications for the Network Voice Protocol (NVP), RFC 741, NIC 42444, Nov. 22, 1977, pp 43-88 IN: ARPANET Protocol Handbook, NIC 7104, Network Information Center, SRI International, Menlo Park CA, rev. Jan 1978.]. The protocol was developed by a group headed by the University of Southern California, Informatin Sciences Institute (ISI), as part of ARPA's Network Secure Communications (NSC) project. The goal of this project was to demonstrate a digital, high-quality, low-bandwidth, secure voice handling capability across the ARPANET. The protocol has been used successfully for experiments between ISI, BBN, SRI, MIT'S Lincoln Laboratory (MIT-LL), Culler-Harrison, Incl, and the Speech Communications Research Lab, Inc." Packetized voice was first tranmitted in 1974 with point-to-point connections, and in 1975 with conference connections. A prototype voice message system was implemented at ISI in 1978. This was integrated into the user's work environment, rather than "just" a computer-based answering machine. I do not know whether the ISI voice message system was integrated into the public telephone network. The ARPA voice project is discussed in two papers: Cohen, D., "A voice message system," in R. P. Uhlig (ed.), Computer Message Systems, pp. 17-27, North-Holland, 1981. Gold, Bernard (invited paper), "Digital Speech Networks", Proc. IEEE Vol. 65, No. 12, Dec. 1977. 2 THE IBM VOICE FILING SYSTEM (These notes are from a collegue's trip-report, dated Sep. 12, 1978). At COMPCON 78 (September, 1978), Steve Boise, Manager of the Voice Filing System project at IBM, Yorktown Heights, gave a presentation. There are six people on the project. it was started five years ago (i.e. in 1973). Three of them are psychologists, three computer types. They considered this the first step toward an integrated office information system. The project is aimed toward providing direct support to office principals (i.e., not secretaries or other support people). (Note: the COMPCON proceedings do not appear to have an abstract or paper on the IBM system.) Boise's project is an audio correspondence system. "Correspondence" refers to non-interactive communications, those not requiring people to get together at the same time. IBM has had a system in use, at an experimental level, for 2 1/2 years (i.e., since 1976). it uses a System 7 for real-time control, and a 370/168 as a time-shared host. The main purpose of the 168 is for mass storage. They use 2 hours of CPU time per month. There is 1 Mbyte of "on line" storage, and 800 Mbytes in "MSS" (archival storage?). Users access the system by dialing in from any touch-tone phone. Boise gave a demo of the actual system. All control for the system is by touch-tone. Audio input is used only for message content. The user can originate messages, transmit them (using touch-tone keys to specify addresses), listen to his own mail, and several other functions. The system automatically eliminates any long pauses from messages. This has had the unanticipated benifit of practically eliminating "mike fright". Users don't have to worry about pausing when deciding what to say. The system also uses some other tricks to speed up playback without altering voice quality. Typically, 50 wpm recording becomes 150 wpm on playback. Another unintended result is that recordings sound much more as if the person knows what he is talking about. You can record a message, and specify it to be delivered at some future time. The computer will call up the addressee and tell him about the message. It can try several different numbers, and will call back later if no answer. If you go away, you can leave a forwarding number. Users can file mail if they desire. Retrieval can be by originator, dates, and classification -- all under touch-tone control. Messages are automatically erased from the mailbox after two weeks, if they have been read at least once. Users like this feature as it frees them from having to worry about disposing of old mail. File protection concepts are built in. Every message has an owner. Several levels of access are possible: read-only, read and forward, read, append, and forward. There are also several "classifications": unclassified, personal, and confidential. You can check if someone has read the mail you sent him. Other status information is also available, such as whether he has logged in today, etc. You can also record a message to be read to anyone who asks about you. So, for example, if you are out of town for a week, you can leave a message saying so. The system provides extensive editing facilities which are mostly unused as the users think they are too complex. The system is heavily instrumented. The implementors know which features are used, and how much. They know every command that has been given on the system (but not message content). The real issue is building a good "principal interface". You must make the entry cost to the principal very low. The system uses lots of (audio) prompting an dmultiple-choice responses. To start using the system, there are only seven touch-tone commands to learn. Commands use the touch-tone letters as mnemonics, e.g., *R means "record". There is a "help" facility. The " " key, followed by any other key tells what that key will do. References for the IBM system include the following: Gould, J. D., and Boies, S. J. "Speech filing -- an office system for Principals." IBM Systems Journal, Vol 23, No. 1, 1984. pp. 65-81. (Also IBM Res. REp. RC-9769, Dec. 1982). Gould, J. D., and Boies, S. J. "Human factors challenges in creating a principal support office system -- The Speech Filing System Approach." ACM Trans. on Office Info. Systems, Vol. 1, No. 4, October 1983, pp. 273-298. The following were referenced by the above papers. I haven't seen them at this time. Boies, S. J. "A computer based audio communication system," AIIIE Conference on Automating Business Communications, (January 23-25, 1978), pp. 369-372. (Paper can be obtained from Management Education Corporation (MEC), Box 3727, Santa Monica, CA 90403.) Zeheb, D. and Boies, S. J. "Speech filing migration system," in H. Inose (Editor), Proceedings of the International Conference of Computer Communication (September 1978), pp. 571-574. IBM Audio Distribution System Subscriber's Guide, SC34-0400-1, IBM Corporation, 4111 Northside Parkway N.W., Box 2150, Atlanta, GA 30056; also available from IBM branch offices. 3 OTHER WORK (NOT NECESSARILY VOICE-MAIL) A number of companies produced systems for audio-response applications where a customer could retreive information stored on a computer by using a Touch-tone (tm) telephone. Survey articles were published in Datamation (1969) and by Datapro (September 1976). These systems used prerecorded human speech to produce messages with limited content. The misdial message "the number you have dialed, 555-1212, is not in service..." is produced by a similar system. Delphi Communications (part of Exxon information systems) was founded to do voice messaging. Computalker Consultants (Santa Monica, CA) developed hardware for speech synthesis (connected to microcomputers using the S100 bus architecture). The Computalker CT1) could not be directly connected to the public telephone network. Rice, D. L. "Friends, humans, and countryrobots: lend me your ears", Byte, Number 12, August 1976. Rice, D. L. "Speech Synthesis by a set of rules (or can a set of rules speak English?)", Proceedings of the First West Coast Computer Faire, San Francisco, 1977. Rice, D. L. "Hardware and software for speech synthesis", Dr. Dobbs Journal, April 1976. Votrax (Troy Michigan) developed hardware for phonemic synthesis that could be connected to any computer that supported Ascii text (RS232 asychronous port) and could connect to a Bell 407 -- and hence to the public telephone system. Systems using the Votrax and Bell 407 were developed at Bell Labs by M. D. McIlroy to do unrestricted text-to-speech conversion. This allowed directory-assistance applicications to be implemented on a Unix (version 6) system. The software was available under license from Bell Laboratories in 1978 (or earlier). By connecting the text-to-speech software to to standard Unix utilities using the "pipe" mechanism, voice mail and computer-generated broadcast messages ("Time for lunch!") could be easily implemented. Using the same hardware, Lauren Weinstein implemented a "Touch-tone Unix" interface at UCLA. Using this hardware, and suggestions from Lauren Weinstein, I implemented a Touch-tone RSTS/E system at the Dec Research and Development group. It was shown publicly at Canada Decus, February 26-29, 1980. Posted: Mon 21-Oct-1985 16:53 Maynard Time. Martin Minow MLO3-3/U8, DTN 223-9922 To: RHEA::DECWRL::"human-nets@rutgers.arpa",RHEA::DECWRL::"telecom@mit-xx.arpa"