[uw.general] `dearpat' - An on-line data service

lmjones@watsol.waterloo.edu (08/28/89)

                       ``dearpat'' - An on-line data service


           The  Centre for the New OED and Text Research has created an
           on-line  service  to  provide information from our database.
           Our data files include:

                The Oxford English Dictionary, 2nd edition

                The bibliography for the OED

                The  Bible  -  King James Version including Old and New
                Testaments and Apocrypha

                The Complete Sherlock Holmes by Sir Arthur Conan Doyle

                The Canada-U.S. Free Trade Agreement

                The Oxford Advanced Learner's Dictionary,3rd edition

                William Shakespeare: The Complete Works (on order)

           Queries  can  be sent by electronic mail to dearpat@watmath.
           We  will  try to answer queries as quickly and accurately as
           possible  although  sometimes the amount of data or research
           may be prohibitive.

           Some sample queries and responses follow:

           dearpat
                  I  would  like  to  know  how  many  times  the  word
                  ``hound'' and ``dog'' are used in the Sherlock Holmes
                  story ``The Hound of the Baskervilles''.

           response
                  The  word  `hound'  appears in the ``Hound of Basker-
                  villes'' 74 times while `dog' occurs only 31 times. A
                  further   interesting   point   is  that  of  the  31
                  occcurrences  of  `dog',  6  are the verb `to dog' as
                  ``he  had  dogged us so long'', 2 are in the compound
                  `dog-cart'  and  many  others  do  not  refer  to the
                  `hound'.
                                           ***

           dearpat
                  I  am  interested  in millennial vocabulary and would
                  like  to  know  of  the occurrrences of such words as
                  ``apocalypse'',  ``armageddon''  and  ``doomsday'' in
                  the Bible.

           response
                  The ``millennial'' vocabulary, as you refer to it, is
                  not  very common in the Bible. The words `apocalypse'
                  and  `doomsday'  do  not  occur  in  the Bible at all
                  whereas  `armageddon'  is found only once, in Revela-
                  tions: 16:16.  In general, the hellfire and brimstone
                  vocabulary is very limited in the Bible: for example,
                  compare the following frequencies of patterns: heaven
                  853,  hell  64; saved 126, damn (including damned and
                  damnation)  16; good 1148, evil 748; lord 8794, devil
                  122.
                                           ***


           dearpat
                  Can  you  please  tell me all of the words in English
                  that  begin  with  the  sound  ``fa''  as  in ``fat''
                  whether they begin with a `f' or a `ph'?

           response
                  On   referring   to  the  Oxford  Advanced  Learner's
                  Dictionay,3rd  edition,  a  dictionary  that includes
                  everyday  official  and  informal language as well as
                  literary  language  of the 19th and 20th centuries, I
                  find  that  there  are  73  words that begin with the
                  sound  ``fa''  as  in ``fat''. I will send you a com-
                  plete  list  of  these  words.  A partial one follows
                  here:  factual,  famine,  fancy, fascinate, phantasm,
                  Pharisee, pharynx.
                                           ***

           dearpat
                  I  would  like  a  list  of  all  the words that have
                  entered  the  English  language  from Malay.  Can you
                  compile  such a list from the Oxford English Diction-
                  ary?

           response
                  The  pattern  ``Malay...'' (i.e. any word that begins
                  with  the  sequence  Malay)  occurs  in the OED2 1660
                  times.  Of  these occurrences, 256 are in the etymol-
                  ogy:  the  section  of the dictionary that deals with
                  the  origin and history of a word. Even if a language
                  is  mentioned  in  the  etymology, it does not neces-
                  sarily  mean  that  it  is the language of origin. If
                  however,  the  language is mentioned at the beginning
                  of  the etymology, it is more likely to be so. In the
                  case  of  Malay,  235 of the 256 occurrences of Malay
                  are  in  the  first line of the etymology. Thus, I am
                  sending  you  two  lists:  one with the all the words
                  that  have  Malay  in  the etymology and one with the
                  words  that  do  not  have Malay in the first line (a
                  subset  of  the  first).  For  scholarly  purposes, I
                  recommend a thorough analysis of the words and etymo-
                  logies  in  order  to determine the actual origin.  A
                  partial  list  is included here: amok, bamboo, Cajan,
                  dugong,  godown,  junk, ketchup, lingo, mango, orang-
                  outang, palanquin, rattan, sago, teak,zamorin.
                                           ***

           dearpat
                  I  am  studying the works of William Makepeace Thack-
                  eray and have noticed that he seems to invent  words.
                  I am wondering if any of the words that appear in the
                  OED have been used only be him?

           response
                  I  have  checked  the  ``Hapax  Legomena'' file which
                  includes  all the words with only one citation in the
                  OED2  and find that there are 69 from Thackeray. I am
                  sending  you  a  complete list and the following is a
                  sample   only:   airwards,   birdikin,   crucificial,
                  lordkin,   portify,   unadroitly,  whipping-snapping.
                  Please  note,  although  there  is only one citation,
                  this  is  not  conclusive  proof  that the forms were
                  `invented'  by Thackeray. In fact, one might say that
                  Thackeray did not so much invent words as played with
                  linguistic  conventions  to  create  new forms of old
                  words.
                                           ***

           dearpat
                  I would like some information from the Canada-US Free
                  Trade  Agreement.   Could  you  please tell me if the
                  Canadian  General  Electric is mentioned in the docu-
                  ment.

           response
                  Although  many  companies are named in this document,
                  the Canadian General Electric (CGE, GE, General Elec-
                  tric)  is not referred to. This does not mean that it
                  is  not  included in more general terms but only that
                  it is not specifically named.

                                           ***

           dearpat
                  Can  you  please  make  a  list of all the words that
                  relate to clothing?

           response
                  This  query  is not as simple as it appears. Articles
                  of  clothing are not labelled as such in the diction-
                  ary  and might be defined without the use of the word
                  `clothing':     descriptions   such   as   `garment',
                  `apparel',  `headgear',  and `outerwear' may indicate
                  `clothing'   in  the  definition.  For  example,  the
                  Japanese word yukata is described as a `light kimono'
                  or  a  `housecoat'. Because of the time that would be
                  involved  in defining and carrying out such a search,
                  ``dearpat'' is unable to give you a response.

                                           ***

daford@watdragon.waterloo.edu (Daniel Ford) (08/29/89)

Can we use this service for solving cross-word puzzles?  
Such as: "What 10 letter words have a 'z' as the fourth letter and end with
'ious'?"

Dan

tbray@watsol.waterloo.edu (Tim Bray) (08/29/89)

Lindsay Patten writes:
>Is there any online access to the NOED for the general unix community?

In general, the OED database is available to UW personnel for purposes
of research and teaching.  Right now, you have to get an account on the
computer 'watsol'.  However, DCS is planning to extend that availability
via some big Sun meat-grinder they are buying.

>Even
>if were something as simple as a mail server which answered queries like
>
>mail noed@sol
>entry pedantic
>.
>
>and sent back mail giving the dictionary entry for pedantic.  

Well, yes and and no.  We were originally thinking about doing just that,
but if you just want to look up the entry, OED may not be the dictionary
for you.  Since it has *everything* about *everything* back to 1150 A.D.,
a simple request can have an unreasonably-huge answer.  Also, the entries
are densely populated with structural codes that are not immediately
self-evident in their import.  To illustrate my point without unduly
cluttering up the net, I will mail the entry for 'pedantic' to anybody who
requests it of me (tbray@watsol).

One of the reasons for the recently-announced 'dearpat' service is to
try to figure out what kinds of questions people would ask if they could ask.

>Of course,
>a fancy window based browser system would be even nicer.

Yes, in fact the only sane thing.  We have such an interface, and for those
with X screens, it will probably be available via the DCS service I
mentioned above.

Daniel Ford writes:
>Can we use this service for solving cross-word puzzles?  
>Such as: "What 10 letter words have a 'z' as the fourth letter and end with
>'ious'?"

Well, the existing query tools aren't optimized for this kind of question,
which means you'd have to write a lex program or something and wait a couple
hours for the answer.  Who knows, maybe there's some interesting research
in devising and implementing an efficient interface to support the needs
of crossword puzzlers?

Cheers, Tim Bray, New OED Project