stickler@utrio.helsinki.fi (Patrick Stickler) (06/28/91)
Has Mirriam Webster 'regressed' in their open attitude towards people using Digital Webster for computational linguistics work? The original dictionary was refreshingly open and flexible, but now it appears to have been recast in a more "protected", if not even encrypted format. I had the oportunity to work with one of the original cubes about two years ago, and then, the dictionary entries were in text format and there were even C functions for extracting individual entries! I also remember that it was explicitely stated in the release notes (although I don't remember exactly where) that the flexibility and openness of the system was intended to provide tools for computational linguistic work. It was, in a very real sense, a computational linguist's dream come true. Unfortunately, there was no way I could afford a NeXT then, and the only one that I knew of in Finland was (understandably) kept rather secretly guarded. Now that the prices have dropped (bravo NeXT!), I have been preparing to buy one; but after just looking at a new NeXTstation (with the Extended OS, so there was nothing 'missing'), I noticed that the dictionary is in some binary format, and I was unable to find any mention of the C function library that existed with OS version 1.0 (and I *searched*!). It *was* possible to cut from Digital Webster and paste to an plain text file, but this puts it into the realm of manual extraction which is a *very* lengthy and tedious process (linguists have been complaining for years about having to do this - it's a needless waste of resources). What gives with Mirriam Webster. Where's the great "scholar's workstation" for us folks who are not into physics, computer science, or Shakespeare? (please redirect all flames to /dev/null) I must admit that NeXTstep 2.1 is worlds better than 1.0 and 1.0 was fantastic; however, in this point with Digital Webster, it is a *major* step backwards. I have been dreaming for over two years of the day when I can get a NeXT machine *and* Digital Webster; but now, it seems that the dictionary will be of little more use than the CD-ROM versions (which are practically worthless for computational linguistics, unless you want to type in 140,000 entries manually...) Of course, it should be easier to extract the needed information from Digital Webster than from a CD-ROM, but I will be *very* dissapointed if Webster no longer has this unprecedented and outstanding support for computational linguistics. I am hoping that the functionality still exists, but that it has simply become 'undocumented' so that it does not become widely abused. I fully appreciate Mirriam Webster's (and any publisher's) concerns to protect their *enormous* investment in their dictionary, but until resources such as machine readable dictionaries are available to more than a *very* small handful of researchers at places like IBM and Xerox Parc, the potential for large, many-domained, and robust systems will remain severly limited. There are many areas of research in computational linguistics which suffer greatly from the lack of large amounts of the type of information contained in dictionaries, and Digital Webster released with the original cube was the first real step (in my opinion) to getting such resources to the masses. I originally applauded both NeXT and Mirriam Webster for making such a valuable resource available to the academic community, but now they get a big booo. (NeXT and Webster, are you listening?) The NeXT seemed to be a 'miraculous' tool for linguists, but the mirage is beginning to disappear... ////////////////////////////////////////////////////////////////////////// Patrick Stickler Research Unit for Computational Linguistics stickler@cc.helsinki.fi Department of General Linguistics (+358 0) 1913513 University of Helsinki ========================================================================== The comments contained herein cannot reflect the official views of my employer. (proof left for the reader) //////////////////////////////////////////////////////////////////////////