[sci.skeptic] Natural Language Processing, Topological Grammars

larsbc@ncrsecp.Copenhagen.NCR.dk (Lars Ballieu Christensen) (09/08/89)

I am posting this for a friend, who does not have direct access
to the net. Please reply to this newsgroup or directly to me, and
I will forward it.

"Studying the various natural language grammars used for
automatic language parsing systems, as a special subject student
in Computer Science and Danish Language at the University of
Roskilde, I have (amongst others) come across the following
problem:

The concrete subject of the investigation is the danish part of
the Eurotra project, which in its basic idea uses the recognized
topological grammar for danish language developed by P.
Diderichsen [1].

The problem can be stated as follows: Is the lack of theoretical
linguistic description in the grammars used for automatic NW
systems a necessary consequence to reach the primary goal (=
provide an applicable system), or is it possible, maybe on a
longer term basis, to provide better NW systems by developing
grammars, which are build with higher degree of respect to the
theoretical linguistic description?

Generally when developing automatic NL system, the goal seems to
be building parsers, which form the most correct analysis of the
language structures on each level of the specific system.

This strategy causes, at least for grammars for danish language,
that the NW grammars, which primarily have been build/specified
by theoretical language scientists, that the theoretical basis of
the language often seems to be neglected. Instead, ad hoc
grammars are build in order to meet the demands for correctness,
as well as the commercial restrictions and their ability to be
implemented, i.e. computerized.

Viewpoints and comments on this subject will be highly
appreciated. Also, I would like to share my experiences with
anyone who state the wish.

Thanks in advance

Henrik Sternberg Jensen, 
Department of Computer Science,
University of Roskilde, 
DK-4000 Roskilde

References:

[1]  Diderichsen, P., Elementr Dansk Grammatik, Gyldendal,
     Kbenhavn, 1979.

[2]  Rue, H., Diderichsen p Prolog, SAML 12, University of
     Copenhagen, 1986.

[3]  Togeby, O., Parsing Danish Text in Eurotra, Nordic Journal
     of Linguistics, vol. 11, no. 1-2, Universitetsforlaget AS,
     Oslo, 1988."

-- 
Lars Ballieu Christensen	     Email: Lars.B.Christensen@Copenhagen.NCR.DK
NCR Systems Engineering	Copenhagen   Phone: +45 38 33 00 22 
Contract Development, Svanevej 14,   Fax:   +45 31 10 23 62 
DK-2400 Copenhagen NV, Denmark 	     "Music is your only friend - till the end"

lee@uhccux.uhcc.hawaii.edu (Greg Lee) (09/08/89)

From article <1933@ncrsecp.Copenhagen.NCR.dk>, by larsbc@ncrsecp.Copenhagen.NCR.dk (Lars Ballieu Christensen):

)... is it possible, maybe on a
)longer term basis, to provide better NW systems by developing
)grammars, which are build with higher degree of respect to the
)theoretical linguistic description?

I don't think it's possible, unless the long term basis is long
enough to permit adequate theoretical linguistic descriptions
to be discovered.  Say a century.  One might reasonably
expect linguists' descriptions now to provide a good account of
the facts of a language, or a fact in a group of languages,
and to give a convenient terminology for describing facts.
But that's not `theoretical' in the usual sense.

)Generally when developing automatic NL system, the goal seems to
)be building parsers, which form the most correct analysis of the
)language structures on each level of the specific system.

But who knows what "the most correct analysis" is?  Should verbs in
English go with their objects in a `verb phrase', or with their
subjects, or with neither?  Intonation sometimes suggests the second
choice.  Tradition and some not altogether conclusive arguments about
possible idioms suggest the first.  My colleague Stan Starosta has a
well-developed theory that makes the third choice.  In this and other
instances, it seems to me that fashion or convenience more than theory
dictates what turns up in current descriptions.  Some syntacticians are
enthralled with binary branching trees, but for any principled reason?
Not so far as I know.

)This strategy causes, at least for grammars for danish language,
)that the NW grammars, which primarily have been build/specified
)by theoretical language scientists, that the theoretical basis of
)the language often seems to be neglected. Instead, ad hoc
)grammars are build in order to meet the demands for correctness,

But that's what linguists are doing, too.

			Greg, lee@uhccux.uhcc.hawaii.edu