[comp.sys.next] Digital Webster under 2.x "Protected" ??!!

stickler@utrio.helsinki.fi (Patrick Stickler) (06/28/91)

Has Mirriam Webster 'regressed' in their open attitude towards people using
Digital Webster for computational linguistics work? The original dictionary
was refreshingly open and flexible, but now it appears to have been recast
in a more "protected", if not even encrypted format.

I had the oportunity to work with one of the original cubes about two years
ago, and then, the dictionary entries were in text format and there were even
C functions for extracting individual entries!  I also remember that it was
explicitely stated in the release notes (although I don't remember exactly
where) that the flexibility and openness of the system was intended to
provide tools for computational linguistic work. It was, in a very real
sense, a computational linguist's dream come true. Unfortunately, there was
no way I could afford a NeXT then, and the only one that I knew of in Finland
was (understandably) kept rather secretly guarded. 

Now that the prices have dropped (bravo NeXT!), I have been preparing to buy
one; but after just looking at a new NeXTstation (with the Extended OS, so
there was nothing 'missing'), I noticed that the dictionary is in some binary
format, and I was unable to find any mention of the C function library that
existed with OS version 1.0 (and I *searched*!). It *was* possible to cut
from Digital Webster and paste to an plain text file, but this puts it into
the realm of manual extraction which is a *very* lengthy and tedious process
(linguists have been complaining for years about having to do this - it's
a needless waste of resources).

What gives with Mirriam Webster. Where's the great "scholar's workstation"
for us folks who are not into physics, computer science, or Shakespeare?

(please redirect all flames to /dev/null)

I must admit that NeXTstep 2.1 is worlds better than 1.0 and 1.0 was
fantastic; however, in this point with Digital Webster, it is a *major*
step backwards.

I have been dreaming for over two years of the day when I can get a NeXT
machine *and* Digital Webster; but now, it seems that the dictionary will be
of little more use than the CD-ROM versions (which are practically worthless
for computational linguistics, unless you want to type in 140,000 entries
manually...) Of course, it should be easier to extract the needed information
from Digital Webster than from a CD-ROM, but I will be *very* dissapointed if
Webster no longer has this unprecedented and outstanding support for
computational linguistics.
	
I am hoping that the functionality still exists, but that it has simply
become 'undocumented' so that it does not become widely abused.

I fully appreciate Mirriam Webster's (and any publisher's) concerns to
protect their *enormous* investment in their dictionary, but until resources
such as machine readable dictionaries are available to more than a *very*
small handful of researchers at places like IBM and Xerox Parc, the potential
for large, many-domained, and robust systems will remain severly limited. 
There are many areas of research in computational linguistics which suffer
greatly from the lack of large amounts of the type of information contained
in dictionaries, and Digital Webster released with the original cube was the
first real step (in my opinion) to getting such resources to the masses.

I originally applauded both NeXT and Mirriam Webster for making such a
valuable resource available to the academic community, but now they get a
big booo. (NeXT and Webster, are you listening?)

The NeXT seemed to be a 'miraculous' tool for linguists, but the mirage
is beginning to disappear...


//////////////////////////////////////////////////////////////////////////
     Patrick Stickler       Research Unit for Computational Linguistics
  stickler@cc.helsinki.fi       Department of General Linguistics
     (+358 0) 1913513                 University of Helsinki
==========================================================================
        The comments contained herein cannot reflect the official
           views of my employer.  (proof left for the reader)
//////////////////////////////////////////////////////////////////////////