[comp.ai.digest] Speech Databases

nowlan@ai.toronto.EDU ("Steven J. Nowlan") (10/08/87)

A while back I posted a request for information on publically available
speech databases. A number of people sent me requests for this information
so I am posting a summary.

The best source of these databases in North America is the National
Bureau of Standards (NBS). The person to speak to is David Pallett
whose phone number is (301) 975-2935. However he is extremely busy.

The NBS maintains copies of several speech databases for isolated word
or connected digit recognition. They make copies of these databases
available on various media so that interested parties can copy the
database for their own uses. However, the waiting lists for most of
these databases are very long!

Here is a brief summary of what is available:

1. TI Isolated Word Database - digits, 10 control words, alpha-set
	Multi-speaker, isolated word

2. VERBEX Database - eleven digits, multi-speaker, isolated word

3. FAA Database - 68 word vocab., multi-speaker, isolated word, phone lines

4. TI Connected Digits Database - variable length digit strings, multi-speaker
	multi-dialect

Access to the above is obtained by contacting David Pallett.

5. AFTI/F-16 - 70 word vocab., multi-speaker, high-noise
	Contact: Dr. Thomas J. Moore, Biological Acoustics Branch
	Air Force Aerospace Medical Research Lab, Wright-Patterson
	AFB, OH 45433.

6. MIT ICE CREAM database - connected sentences (1000 different sentences)
	multi-speaker.  Contact: David Pallett (Available end 87)

7. DARPA "Spelling Bee" Database - sentences of form "word spelling", 600
	word vocab., multi-speaker, Contact: David Pallett (Avail 87?)

There are also a couple of DARPA databases available to the DARPA contractor
community, which are not yet public access, but may be in the near future.

Thanks to everyone who provided me with information, and I hope others
may find the above information useful.

				Steve Nowlan
			Arpanet: nowlan%ai.toronto.edu@relay.cs.net
			CSNet,Bitnet: nowlan@ai.toronto.edu
			EAN,X.400: nowlan@ai.toronto.cdn
			UUCP: {uunet,watmath}!ai.toronto.edu!nowlan