[comp.archives] [next] Homophone Dictionary Utility

Christopher Lane <lane@sumex-aim.stanford.edu> (09/12/90)
Archive-name: homophone/07-Sep-90
Original-posting-by: Christopher Lane <lane@sumex-aim.stanford.edu>
Original-subject: Homophone Dictionary Utility
Archive-site: cs.orst.edu [128.193.32.1]
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)

I've ftp'd the file Homophones.tar.Z to the submissions directory of the
cs.orst.edu archive, which is a homophone dictionary utility to build and
interrogate a database of homophonic word sets derived from the Merriam-
Webster database.

    ho-mo-phone \'haHm-e-,foEn, 'hoE-me-\ n [ISV] (1843)
    1: one of two or more words pronounced alike but different
        in meaning or derivation or spelling
    2: a character or group of characters pronounced the same
        as another character or group

The homophone dictionary is a side effect of our speech recognition work and
contains just over 1200 homophonic word sets. e.g:

    ['aE-bel] abel able
    [ik-'sept] accept except
    [,ak-le-'maE-shen] acclamation acclimation
    ['oGl] all awl
    ['berth] berth birth
    ['boEl] bole boll bowl

The program that extracts the dictionary from Webster is an example of using:

o Programmatic access to Webster -- this utility 'walks' the Webster database
word by word (getting around at least one bug).

o The 'Storage' object -- In all the example program sources we've online, I
only found one (Tools) that used the Storage object--now I know why.

o The 'HashFile' object -- Yet another example of using my HashFile object
(posted to the archive earlier) interface to the 'db' routines.  This program
keeps objects out on a database file and accesses/loads them as needed.

Due to inconsistencies in the pronunciations in Webster, the homophonic
database does not contain all possible homophonic word sets.  More could be
generated by more manipulating/filtering of the pronunciation strings.  I'd be
interested in hearing from anyone who does so.

- Christopher
-------