Christopher Lane <lane@sumex-aim.stanford.edu> (09/12/90)
Archive-name: homophone/07-Sep-90
Original-posting-by: Christopher Lane <lane@sumex-aim.stanford.edu>
Original-subject: Homophone Dictionary Utility
Archive-site: cs.orst.edu [128.193.32.1]
Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti)
I've ftp'd the file Homophones.tar.Z to the submissions directory of the
cs.orst.edu archive, which is a homophone dictionary utility to build and
interrogate a database of homophonic word sets derived from the Merriam-
Webster database.
ho-mo-phone \'haHm-e-,foEn, 'hoE-me-\ n [ISV] (1843)
1: one of two or more words pronounced alike but different
in meaning or derivation or spelling
2: a character or group of characters pronounced the same
as another character or group
The homophone dictionary is a side effect of our speech recognition work and
contains just over 1200 homophonic word sets. e.g:
['aE-bel] abel able
[ik-'sept] accept except
[,ak-le-'maE-shen] acclamation acclimation
['oGl] all awl
['berth] berth birth
['boEl] bole boll bowl
The program that extracts the dictionary from Webster is an example of using:
o Programmatic access to Webster -- this utility 'walks' the Webster database
word by word (getting around at least one bug).
o The 'Storage' object -- In all the example program sources we've online, I
only found one (Tools) that used the Storage object--now I know why.
o The 'HashFile' object -- Yet another example of using my HashFile object
(posted to the archive earlier) interface to the 'db' routines. This program
keeps objects out on a database file and accesses/loads them as needed.
Due to inconsistencies in the pronunciations in Webster, the homophonic
database does not contain all possible homophonic word sets. More could be
generated by more manipulating/filtering of the pronunciation strings. I'd be
interested in hearing from anyone who does so.
- Christopher
-------