[comp.mail.multi-media] dictionary servers

jwz@lucid.com (Jamie Zawinski) (11/09/90)

I know of two formats in which Webster's dictionary can be found online.
I have a GNU Emacs package (by Jason Glasgow) for talking to one of them,
and a Unix program (by Ed James) for talking to the other.  

mintaka.lcs.mit.edu runs a server of the first kind, and pasteur.berkeley.edu
runs a server of the second kind; but pasteur won't talk to any machines not
at berkeley, so I can't use it any more.  This is unfortunate, because the
second format is a better one.  So my first question is, are there any
machines out there which run a server of the second sort which will talk to me?

My second question is, are either of these formats the same as that which
the NeXT webster server uses?  If not, what is the format that the NeXT server
uses?  And are there any NeXTs out there which will answer webster connections
to arbitrary machines on inet?

Here is a brief description of the two formats I know of, so you will know
what I'm talking about;

The mintaka kind uses port 103; it is very simple, supporting single-word
commands of the form "DEFINE word"; it does spelling correction as well, when
you ask for the definition of a word that it doesn't know about, or when you
issue the command "SPELL word".  There is also a command for listing all
words beginning with a given prefix.  The definition which is sent back looks
like 

 phi.lis.tine \'fil-*-.ste-n; f*-'lis-t*n, -.te-n; 'fil-*-st*n\ \-.iz-*m\ n
   cap  1: a native or ingabitant of ancient Philistia  often cap  2a: a crass
   prosaic often priggish individual guided by material rather tha n
   intellectual or artistic values : BABBITT 2b: one uninformed in a special
   area of knowledge  - philistine aj

that is, the paragraphs come filled, and lines are pre-wrapped at 79 columns.
There is little hope for making this look any prettier, since it's been 
chewed on already.

The other kind of server, of which pasteur.berkeley.edu is a variety, uses
port 1964, and has an interface very much like SMTP or NNTP - responses begin
with three digit numbers, 2-- means ok, 5-- means failure, etc.  The big win
of this server is that it preserves font-change and special-character
information.  The definition body that comes back is broken up into records.
There are two levels of encoding; at the first level structural elements of
the definition are sent one per line, in a form like

   <character> : <field-1> ; <field-2> ; <field-3> ...

where the character says what kind of record this is (definition, label, 
cross-reference, etc).  Each kind of record has a fixed number of fields in
it, separated by semicolons.  This means that if a word has several
definitions (as philistine does, above) then each definition will be in its
own record.

When the fields contain text, as definitions do, they contain typesetter
information.  Special characters and font-changes are encoded with
"overstruck" characters, that is, a sequence like <char-1> <backspace> <char-2>
will either change the font, or will map to one or more different characters.
No line-breaks are included, so a client gets to format and wrap the
definitions as it likes.

One interesting fact is that it is apparently that the mintaka database was
derived from the pasteur database (or a common source) because I have come
across definitions in mintaka's dictionary which have had the font-information
improperly stripped out!  Parts of the font change codes were still visible in
a few cases.

So, any answers?
			-- Jamie

PS: if you have access to a server of the same genotype as pasteur, and you
have a TI Explorer Lisp Machine, you can use the code in
/usr/jwz/public/dictionary-client.lisp on spice.cs.cmu.edu to talk to it with
a hypertextized interface (clicking on words defines them, making it easy to
navigate around the dictionary).  GNU Emacs code for talking to the other kind
is available at your favorite emacs archive site.

royle@iuvax.cs.indiana.edu (Keenan Royle) (11/09/90)

iuvax.cs.indiana.edu is a webster server.

it is also the home of the software to use NeXT as a webster
server for a generic UNIX clients. (anon ftp)

-- 

Keenan Royle
royle@cs.indiana.edu                         	postmaster@cs.indiana.edu
royle@iubacs.bitnet

pcg@cs.aber.ac.uk (Piercarlo Grandi) (11/19/90)

On 9 Nov 90 04:58:02 GMT, royle@iuvax.cs.indiana.edu (Keenan Royle) said:

royle> iuvax.cs.indiana.edu is a webster server.

royle> it is also the home of the software to use NeXT as a webster
royle> server for a generic UNIX clients. (anon ftp)

This is the second message with details about locating a webster server.
I am not sure, and apologies in advance if I am wrong, but I seem to
remember that Webster is copyrighted material and one has to pay a
copyright fee for making copies of it, e.g. broadcasting or public
performance fees, or copying pats of it over the network.

Probably MIT, Berkeley and Indiana have paid the appropriate fees for
their sites, but access from other sites is a copyright violation, if
what I surmise above is true.

Not only that, but accessing somebody's else webster server without
prior permission is not good net etiquette anyhow, just like accessing
somebody's else NNTP server, unless they are explicitly made available
for network wide access, like anonymous FTP servers are.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk