[mod.ai] Toshiba Voice Recognition Chip

Alex.Waibel@CAD.CS.CMU.EDU (03/18/87)

With respect to the inquiry about the Toshiba Voice Recognition Chip,
here's two words of caution:

First off, recognition performance claims in percent are nice to know,
but in general should be taken with a grain of salt.  These
numbers are HEAVILY dependent on whether speech was recorded in a quiet or
noisy environment, whether the speaker is cooperative or not, whether the
test was done speaker-dependently or independently, whether the vocabulary
in question is ambiguous (BOOK, COOK, TOOK) or not (BOOK, UNIVERSITY).
Most of the current systems are also isolated word systems, i.e., one must
make pauses between words.  Whether such a system will work or not therefore
relly depends on your particular recognition task and environment.

Japanese has also two convenient properties:
Words are mostly consonant-vowel sequences, and the Japanese writing
system (Kana) consists of essentially sequences of syllable symbols.  Toshiba
and other Japanese manufacturers therefore have systems that allow the speaker
to speak one of the (in the order of 100 or so (including some alternates)
kanas at a time and have the word processor then convert a sequence of kanas
into a kanji (the chinese word symbol).  Now, unfortunately, this doesn't
carry over easily into English.  Since English syllables employ complex
consonants clusters, there are more in the order of 20,000 English syllables
(with 100,000 possible), which makes for a substantially harder recognition
task.  Also speaking these syllables in isolation is a lot less natural than
in Japanese since our writing system isn't syllable based.  The corresponding
recognition of phonemes in stead of syllables in English is a VERY hard
problem with good recognition accuracy hard to come by.
Toshiba and other manufacturers (in Japan and the USA) have also whole word
based systems, but most of them require training of the system, i.e,
all words in the vocabulary must be read in at least once by the user.
I've seen the systems at Toshiba and they do indeed do impressive work,
but as far as hooking it up to your home computer and talking away in
English, I'm afraid the story is still a little more complicated than that.

Alex Waibel, CMU