[net.ai] claims about "solving NLP"

LRC.Slocum@UTEXAS-20.ARPA@sri-unix.UUCP (08/30/83)

From:  Jonathan Slocum <LRC.Slocum@UTEXAS-20.ARPA>

I have never been impressed with claims about "solving the Natural
Language Processing problem" based on `solutions' for 1-2 paragraphs
of [usu. carefully (re)written] text.  There are far too many scale-up
problems for such claims to be taken seriously.  How many NLP systems
are there that have been applied to even 10 pages of NATURAL text,
with the full intent of "understanding" (or at least "treating in the
identical fashion") ALL of it?  Very few.  Or 100 pages?  Practically
none.  Schank & Co.'s "AP wire reader," for example, was NOT intended
to "understand" all the text it saw [and it didn't!], but only to 
detect and summarize the very small proportion that fell within its
domain -- a MUCH easier task, esp. considering its miniscule domain
and microscopic dictionary.  Even then, its performance was -- at best
-- debatable.

And to anticipate questions about the texts our MT system has been
applied to:  about 1,000 pages to date -- NONE of which was ever
(re)written, or pre-edited, to affect our results.  Each experiment
alluded to in my previous msg about MT was composed of about 50 pages
of natural, pre-existing text [i.e., originally intended and written
for HUMAN consumption], none of which was ever seen by the project
linguists/programmers before the translation test was run.  (Our 
dictionaries, by the way, currently comprise about 10,000 German
words/phrases, and a similar number of English words/phrases.)

We, too, MIGHT be subject to further scale-up problems -- but we're a
damned sight farther down the road than just about any other NLP
project has been, and have good reason to believe that we've licked
all the scale-up problems we'll ever have to worry about.  Even so, we
would NEVER be so presumptuous as to claim to have "solved the NLP
problem," needing only a large collection of `linguistic rules' to
wrap things up!!!  We certainly have NOT done so.

REALLY, now...