[comp.ai.digest] Machine-Readable Dictionari

france@VTOPUS.CS.VT.EDU (Robert France) (01/16/88)

At Virginia Tech we have been working for a few years with dictionaries
available through the Oxford Text Archive.  Thte OTA is a depository 
for machine-readable literary texts.  They have assembled by this point 
a considerable body of both primary and secondary (lexicographic)
material, all of which is available for research use only at a nominal
fee.  Restrictions and a list of their materials can be obtained from

    The Oxford Text Archive
    Oxford University Computing Service
    13 Banbury Rd.
    Oxford  GREAT BRITAIN  OX2 6NN

    Telephone: Oxford (0865) 56721
    They are on the net but I'm afraid I've misplaced their Eaddress.

Most of OTA's material is available only in (some) typesetters' format, 
and often the formatting conventions are no longer available.  They are
also archiving re-formatted versions as they become available, though,
so in some cases the data is fairly directly useable.  A case in point
is the following:

One of our early efforts with machin-readable dictionaries involved
translating the Collins English Dictionary from typesetters' format into
a set of files of Prolog facts.  These facts include, for the c. 80,000
headwords in the CED:  syllabification, variant spellings, abbreviations,
irregular inflections and morphological variants;  parts of speech and
semantic register information; "also called", "related adjective", and
"compare" cross-references; and the texts of definitions, sample uses
and usage notes.  We ignored only etymology and pronunciation.  A
syntactially corrrect copy of these facts (i.e., a set of facts in
Edinburgh standard syntax that can be consulted without blowing up a
Prolog compiler) is now on deposit at the Archive and available under
the same terms as the raw data.  We are working on a semantically
correct version (i.e., one where the data in the facts is in all cases
the data that ought to be there), and will deposit that when we have
it complete.

Currently, our group here, headed by E.A. Fox and Terry Nutter,
is coordinating with a group at the Illinois Institue of Technology
headed by Martha Evens to analyse the definition texts of this
and other M-R dictionaries and to integrate our findings into a
*VERY* large semantic net.  This product will also be made available
to the community for research use only.  Anyone desiring further
information on this project is invited to contact any of the principles.
Believe me, we have some stories to tell.

        Good luck,

            Robert France

Department of Computer Science
Virginia Tech
Blacksburg, VA 24061

france@vtopus    fox@vtopus    nutter@vtopus    csevans%iitvax


    "Believing people is a very bad habit.  I stopped years ago."

                (Miss Marple)