LRC.Slocum@UTEXAS-20.ARPA@sri-unix.UUCP (08/30/83)
From: Jonathan Slocum <LRC.Slocum@UTEXAS-20.ARPA> I have never been impressed with claims about "solving the Natural Language Processing problem" based on `solutions' for 1-2 paragraphs of [usu. carefully (re)written] text. There are far too many scale-up problems for such claims to be taken seriously. How many NLP systems are there that have been applied to even 10 pages of NATURAL text, with the full intent of "understanding" (or at least "treating in the identical fashion") ALL of it? Very few. Or 100 pages? Practically none. Schank & Co.'s "AP wire reader," for example, was NOT intended to "understand" all the text it saw [and it didn't!], but only to detect and summarize the very small proportion that fell within its domain -- a MUCH easier task, esp. considering its miniscule domain and microscopic dictionary. Even then, its performance was -- at best -- debatable. And to anticipate questions about the texts our MT system has been applied to: about 1,000 pages to date -- NONE of which was ever (re)written, or pre-edited, to affect our results. Each experiment alluded to in my previous msg about MT was composed of about 50 pages of natural, pre-existing text [i.e., originally intended and written for HUMAN consumption], none of which was ever seen by the project linguists/programmers before the translation test was run. (Our dictionaries, by the way, currently comprise about 10,000 German words/phrases, and a similar number of English words/phrases.) We, too, MIGHT be subject to further scale-up problems -- but we're a damned sight farther down the road than just about any other NLP project has been, and have good reason to believe that we've licked all the scale-up problems we'll ever have to worry about. Even so, we would NEVER be so presumptuous as to claim to have "solved the NLP problem," needing only a large collection of `linguistic rules' to wrap things up!!! We certainly have NOT done so. REALLY, now...