LRC.Slocum@UTEXAS-20.ARPA@sri-unix.UUCP (08/30/83)
From: Jonathan Slocum <LRC.Slocum@UTEXAS-20.ARPA> Before proclaiming the impossibility of automatic [i.e., computer] translation of human languages, it's perhaps instructive to know something about how human translation IS done -- and is not done -- at least in places where it's taken seriously. It is also useful, knowing this, to propose a few definitions of what may be counted as "translation" and -- more to the point -- "useful translation." Abbreviations: MT = Machine Translation; HT = Human Translation. To start with, the claim that "a real translator reads and understands a text, and then generates [the text] in the [target] language" is empty. First, NO ONE really has anything like a good idea of HOW humans translate, even though there are schools that "teach translation." Second, all available evidence indicates that (point #1 notwithstanding), different humans do it differently. Third, it can be shown (viz simultaneous interpreters) that nothing as complicated as "understanding" need take place in all situations. Fourth, although the contention that "there generally aren't 1-1 correspondences between words, phrases..." sounds reasonable, it is in fact false an amazing proportion of the time, for languages with similar derivational histories (e.g., German & English, to say nothing of the Romance languages). Fifth, it can be shown that highly skilled, well-respected technical-manual translators do not always (if ever) understand the equipment for which they're translating manuals [and cannot, therefore, be argued to understand the original texts in any fundamentally deep sense] -- and must be "understanding" in a shallower, probably more "linguistic" sense (one perhaps more susceptible to current state-of-the-art computational treatment). Now as to how translation is performed in practice. One thing to realize here is that, at least outside the U.S. [i.e., where translation is taken seriously and where almost all of it is done], NO HUMAN performs "unrestricted translation" -- i.e., human translators are trained in (and ONLY considered competent in) a FEW AREAS. Particularly in technical translation, humans are trained in a limited number of related fields, and are considered QUITE INCOMPETENT outside those fields. Another thing to realize is that essentially ALL TRANSLATIONS ARE POST-EDITED. I refer here not to stylistic editing, but to editing by a second translator of superior skill and experience, who NECESSARILY refers to the original document when revising his subordinate's translation. The claim that MT is unacceptable IF/BECAUSE the results must be post-edited falls to the objection that HT would be unacceptable by the identical argument. Obviously, HT is not considered unacceptable for this reason -- and therefore, neither should MT. All arguments for acceptablility then devolve upon the question of HOW MUCH revision is necessary, and HOW LONG it takes. Happily, this is where we can leave the territory of pontifical pronouncements (typically utterred by the un- or ill-informed), and begin to move into the territory of facts and replicable experiments. Not entirely, of course, since THERE IS NO SUCH THINGS AS A PERFECT TRANSLATION and, worse, NO ONE CAN DEFINE WHAT CONSTITUTES A GOOD TRANSLATION. Nevertheless, professional post-editors are regularly saddled with the burden of making operational decisions about these matters ("Is this sufficiently good that the customer is likely to understand the text? Is it worth my [company's] time to improve it further?"). Thus we can use their decisions (reflected, e.g., in post-editing time requirements) to determine the feasibility of MT in a more scientific manner; to wit: what are the post-editing requirements of MT vs. HT? And in order to assess the economic viability of MT, one must add: taking all expenses into account, is MT cost-effective [i.e., is HT + human revision more or less expensive than MT + human revision]? Re: these last points, our experimental data to date indicate that (1) the absolute post-editing requirements (i.e., something like "number of changes required per sentence") for MT are increased w.r.t. HT [this is no surprise to anyone]; (2) paradoxically, post-editing time requirements of MT is REDUCED w.r.t. HT [surprise!]; and (3) the overall costs of MT (including revision) are LESS than those for HT (including revision) -- a significant finding. We have run two major experiments to date [with our funding agency collecting the data, not the project staff], BOTH of which produced these results; the more recent one naturally produced better results than the earlier one, and we foresee further improvements in the near future. Our finding (2) above, which SEEMS inconsistent with finding (1), is explainable with reference to the sociology of post-editing when the original translator is known to be human, and when he will see the results (which probably should, and almost always does, happen). Further details will appear in the literature. So why haven't you heard about this, if it's such good news? Well, you just did! More to the point, we have been concentrating on producing this system more than on writing papers about it [though I have been presenting papers at COLING and ACL conferences], and publishing delays are part of the problem [one reason for having conferences]. But more papers are in the works, and the secret will be out soon enough.