france@VTOPUS.CS.VT.EDU (Robert France) (01/16/88)
At Virginia Tech we have been working for a few years with dictionaries available through the Oxford Text Archive. Thte OTA is a depository for machine-readable literary texts. They have assembled by this point a considerable body of both primary and secondary (lexicographic) material, all of which is available for research use only at a nominal fee. Restrictions and a list of their materials can be obtained from The Oxford Text Archive Oxford University Computing Service 13 Banbury Rd. Oxford GREAT BRITAIN OX2 6NN Telephone: Oxford (0865) 56721 They are on the net but I'm afraid I've misplaced their Eaddress. Most of OTA's material is available only in (some) typesetters' format, and often the formatting conventions are no longer available. They are also archiving re-formatted versions as they become available, though, so in some cases the data is fairly directly useable. A case in point is the following: One of our early efforts with machin-readable dictionaries involved translating the Collins English Dictionary from typesetters' format into a set of files of Prolog facts. These facts include, for the c. 80,000 headwords in the CED: syllabification, variant spellings, abbreviations, irregular inflections and morphological variants; parts of speech and semantic register information; "also called", "related adjective", and "compare" cross-references; and the texts of definitions, sample uses and usage notes. We ignored only etymology and pronunciation. A syntactially corrrect copy of these facts (i.e., a set of facts in Edinburgh standard syntax that can be consulted without blowing up a Prolog compiler) is now on deposit at the Archive and available under the same terms as the raw data. We are working on a semantically correct version (i.e., one where the data in the facts is in all cases the data that ought to be there), and will deposit that when we have it complete. Currently, our group here, headed by E.A. Fox and Terry Nutter, is coordinating with a group at the Illinois Institue of Technology headed by Martha Evens to analyse the definition texts of this and other M-R dictionaries and to integrate our findings into a *VERY* large semantic net. This product will also be made available to the community for research use only. Anyone desiring further information on this project is invited to contact any of the principles. Believe me, we have some stories to tell. Good luck, Robert France Department of Computer Science Virginia Tech Blacksburg, VA 24061 france@vtopus fox@vtopus nutter@vtopus csevans%iitvax "Believing people is a very bad habit. I stopped years ago." (Miss Marple)