[net.ai] Machine Translation - a very short tutorial

LRC.Slocum@UTEXAS-20.ARPA@sri-unix.UUCP (08/30/83)

From:  Jonathan Slocum <LRC.Slocum@UTEXAS-20.ARPA>

Before proclaiming the impossibility of automatic [i.e., computer]
translation of human languages, it's perhaps instructive to know
something about how human translation IS done -- and is not done -- at
least in places where it's taken seriously.  It is also useful,
knowing this, to propose a few definitions of what may be counted as
"translation" and -- more to the point -- "useful translation."
Abbreviations: MT = Machine Translation; HT = Human Translation.

To start with, the claim that "a real translator reads and understands
a text, and then generates [the text] in the [target] language" is
empty.  First, NO ONE really has anything like a good idea of HOW
humans translate, even though there are schools that "teach
translation."  Second, all available evidence indicates that (point #1
notwithstanding), different humans do it differently.  Third, it can
be shown (viz simultaneous interpreters) that nothing as complicated
as "understanding" need take place in all situations.  Fourth, 
although the contention that "there generally aren't 1-1
correspondences between words, phrases..."  sounds reasonable, it is
in fact false an amazing proportion of the time, for languages with
similar derivational histories (e.g., German & English, to say nothing
of the Romance languages).  Fifth, it can be shown that highly
skilled, well-respected technical-manual translators do not always (if
ever) understand the equipment for which they're translating manuals
[and cannot, therefore, be argued to understand the original texts in 
any fundamentally deep sense] -- and must be "understanding" in a
shallower, probably more "linguistic" sense (one perhaps more
susceptible to current state-of-the-art computational treatment).

Now as to how translation is performed in practice.  One thing to
realize here is that, at least outside the U.S. [i.e., where
translation is taken seriously and where almost all of it is done], NO
HUMAN performs "unrestricted translation" -- i.e., human translators
are trained in (and ONLY considered competent in) a FEW AREAS.
Particularly in technical translation, humans are trained in a limited
number of related fields, and are considered QUITE INCOMPETENT outside
those fields.  Another thing to realize is that essentially ALL
TRANSLATIONS ARE POST-EDITED.  I refer here not to stylistic editing,
but to editing by a second translator of superior skill and
experience, who NECESSARILY refers to the original document when
revising his subordinate's translation.  The claim that MT is
unacceptable IF/BECAUSE the results must be post-edited falls to the
objection that HT would be unacceptable by the identical argument.
Obviously, HT is not considered unacceptable for this reason -- and
therefore, neither should MT.  All arguments for acceptablility then
devolve upon the question of HOW MUCH revision is necessary, and HOW
LONG it takes.

Happily, this is where we can leave the territory of pontifical
pronouncements (typically utterred by the un- or ill-informed), and
begin to move into the territory of facts and replicable experiments.
Not entirely, of course, since THERE IS NO SUCH THINGS AS A PERFECT
TRANSLATION and, worse, NO ONE CAN DEFINE WHAT CONSTITUTES A GOOD
TRANSLATION.  Nevertheless, professional post-editors are regularly
saddled with the burden of making operational decisions about these
matters ("Is this sufficiently good that the customer is likely to 
understand the text?  Is it worth my [company's] time to improve it
further?").  Thus we can use their decisions (reflected, e.g., in
post-editing time requirements) to determine the feasibility of MT in
a more scientific manner; to wit: what are the post-editing
requirements of MT vs. HT?  And in order to assess the economic
viability of MT, one must add: taking all expenses into account, is MT
cost-effective [i.e., is HT + human revision more or less expensive
than MT + human revision]?

Re: these last points, our experimental data to date indicate that (1)
the absolute post-editing requirements (i.e., something like "number
of changes required per sentence") for MT are increased w.r.t. HT
[this is no surprise to anyone]; (2) paradoxically, post-editing time
requirements of MT is REDUCED w.r.t. HT [surprise!]; and (3) the
overall costs of MT (including revision) are LESS than those for HT
(including revision) -- a significant finding.

We have run two major experiments to date [with our funding agency
collecting the data, not the project staff], BOTH of which produced
these results; the more recent one naturally produced better results
than the earlier one, and we foresee further improvements in the near
future.  Our finding (2) above, which SEEMS inconsistent with finding
(1), is explainable with reference to the sociology of post-editing
when the original translator is known to be human, and when he will
see the results (which probably should, and almost always does,
happen).  Further details will appear in the literature.

So why haven't you heard about this, if it's such good news?  Well,
you just did!  More to the point, we have been concentrating on
producing this system more than on writing papers about it [though I
have been presenting papers at COLING and ACL conferences], and
publishing delays are part of the problem [one reason for having
conferences].  But more papers are in the works, and the secret will
be out soon enough.