[comp.ai.neural-nets] Backpropagation applications

tgd@orstcs.CS.ORST.EDU (Tom Dietterich) (11/09/89)

Your accuracy claims for NETtalk are greatly exaggerated.  I have
replicated the NETtalk study using the same training data.  In this
case, training on 1000 words chosen at random from the 20000-word
dictionary provided by Sejnowski.

After running back propagation for 30 epochs using the parameters
given in Sejnowski and Rosenberg (1986), I obtain the following
results.  Testing is performed on a randomly chosen test set of 1000
words.


                              WORDS  LETTERS (PHON/STRESS)  BITS 
------------------------------------------------------------------
BP                    TRAIN:  65.3    94.0     97.0  96.4    99.5
                      TEST :  14.9    71.6     81.8  81.4    96.7

Numbers give percentage of correct performance:

Explanation:
  TRAIN: performance on the training set
  TEST: performance on the test set
  BITS: average performance on the 26 output bits of the network.
  STRESS: performance on the 5 stress bits
  PHONEME: performance on the 21 phoneme bits
  LETTERS: performance on all 26 bits
  WORDS: performance on whole words (i.e., each letter must be
correct). 

The nettalk network has 120 hidden units, 203 input units (that code,
very sparsely, a 7 letter window), and 26 output units (that code in a
distributed fashion the 54 phonemes and 6 stresses).  The 26 output
bits are mapped to the nearest phoneme/stress combination that was
observed in the training data.  (i.e., a pass was made over the
training data to find all phoneme/stress pairs appearing in the data.
Decoding only considers those pairs.  Ties are broken in favor of the
phoneme/stress pair that appeared more frequently.)  This decoding
scheme is superior to decoding to the nearest syntactically legal
phoneme/stress pair. 


--Tom Dietterich

heck@Sunburn.Stanford.EDU (Stefan P. Heck) (11/10/89)

According to Rumelhart in his ANN/PDP class here, Nettalk was trained on a 
set of the 1000 most common words rather than a random set. This run took
overnight to learn. They later also did a second test using 10 000 words. 
I don't know for which run the accuracy figures are, but supposedly it got
87% right except on words which were irregular. The best competitor at the
time was about 89% accurate. Human capability was estimated at 96%.

Stefan
CSD

hougen@umn-cs.CS.UMN.EDU (Dean Hougen) (11/10/89)

In article <13659@orstcs.CS.ORST.EDU> tgd@orstcs.CS.ORST.EDU (Tom Dietterich) writes:
>Your accuracy claims for NETtalk are greatly exaggerated.  I have
>replicated the NETtalk study using the same training data.  In this
>case, training on 1000 words chosen at random from the 20000-word
>dictionary provided by Sejnowski.
                                        ^^^^^^
>Testing is performed on a randomly chosen test set of 1000 words.
                           ^^^^^^^^

I was under the impression that Sejnowski had NETtalk read real sentences in
real paragraphs, not randomly ordered words.  Right?

BTW, did you present the input as one long string of charcters with the words
seperated by a single space or did you present the words one at a time (i.e.
as a long string of characters with the words seperated by three or more 
spaces) or did you do something else (what?)?

I'll leave you to determine what effect any of this could have on NETtalk's
performance.

Dean Hougen
--
"Stop making sense.  Stop making sense.  Stop making sense, making sense."
    - Talking Heads, "Stop Making Sense," _Stop Making Sense_  

tgd@aramis.rutgers.edu (Tom Dietterich) (11/13/89)

  From: heck@Sunburn.Stanford.EDU (Stefan P. Heck) writes

  According to Rumelhart in his ANN/PDP class here, Nettalk was trained on a
  set of the 1000 most common words rather than a random set. This run took
  overnight to learn. They later also did a second test using 10 000 words.
  I don't know for which run the accuracy figures are, but supposedly it got
  87% right except on words which were irregular. The best competitor at the
  time was about 89% accurate. Human capability was estimated at 96%.

I have also run the algorithm on the 1000 most common words.  The
results are quite similar to those I reported for 1000 randomly
selected words.  Testing is performed on the remaining 19000 words in
the dictionary.


                               WORDS  LETTERS (PHON/STRESS)  BITS  
-------------------------------------------------------------------
BP                     TRAIN:    76.6    94.8   97.1   97.3   99.6
120 hidden units       TEST :    13.4    68.1   78.7   80.0   96.0


Sejnowski and Rosenberg also trained and tested nettalk on a corpus
of connected conversational speech.  I don't have access to that data,
so I haven't replicated that part of their study.

In my work (and in the S&R original), the 1000 most common words are
presented one-at-a-time surrounded by blanks.


Thomas G. Dietterich
Department of Computer Science
Computer Science Bldg, Room 100
Oregon State University
Corvallis, OR 97331-3902