[comp.lang.c] formal language descriptions

henry@utzoo.uucp (Henry Spencer) (08/07/88)

In article <1151@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>                                   The problem with formal definitions like
>>the PL/I definition and the Revised Algol 68 Report is that they are
>>hideously unreadable...
>
>As far being `unreadable' or less `accessible,' that's too easy a target. As
>long as we refuse to correct the problems that exists, they will continue.

Quite true.  As long as we refuse to admit that unreadability is a major
problem that greatly reduces the usefulness of formal standards, they will
continue to see little use.  Even if the authors of a formal definition don't
fall into the trap of "lapidary" writing (paring out every non-essential
word that might help the reader make sense out of the formalisms -- a style
which is endemic in mathematics), the result is seldom something that an
uninitiated reader can tackle without help.  Compare this to the appendix
of K&R, or even the X3J11 drafts, which can be heavy going at times but
*can* be understood without formal background.

The attitude that "anyone but an illiterate peasant ought to be able to
read formal definitions" is part of the problem, not part of the solution.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

smryan@garth.UUCP (Steven Ryan) (08/09/88)

To all those who have written I know about SIGPLAN. Both issues. Writing
a context free syntax for C is simple.

The hard part is always been the context sensitive syntax and the semantics.
Although a type 0 or type 1 grammar can specify these, an excessive number of
productions are devoted to shuttling the cursor so that the results are
unreadable. Because of that, most definitions have resorted to English.

When the English is concise and controlled, the effect is pretty acceptable,
as in the 16 paged Algol 60 report. But when pointers and cast and abundance
of types are presented, precise English becomes muddled. To keep the definition
within a few thousand pages, a lot of hand waving goes on. For example, Ada
does not specify recursive rules on identifier identification. Rather,
they state that if an ambiguity exists, the name cannot be identified. Sorry,
folks, but the choose operator is hardly total recursive, and, as we might
remember from the good old days of calculus, it is frequently not even
recursive.

A formal context sensitive grammar is a (at least) partially recursive
predicate. So any implementation can be considerred a formal definition,
but the language used in the definition is at least as complex as the
language being defined. While any definition ultimately rests on faith,
I would prefer a definition technique as simple as possible.

Semantics are another hard part. Again most formal semantics definitions
are at least as complex as the language as defined.

>>As far being `unreadable' or less `accessible,' that's too easy a target. As
>>long as we refuse to correct the problems that exists, they will continue.
>
>Quite true.  As long as we refuse to admit that unreadability is a major
>problem that greatly reduces the usefulness of formal standards, they will
>continue to see little use.

The easy retort is that no formal language is readable or accessible. That
includes production systems, PL/I, C, Algol, or a hexidecimal dump.

>                             Even if the authors of a formal definition don't
>fall into the trap of "lapidary" writing (paring out every non-essential
>word that might help the reader make sense out of the formalisms -- a style
>which is endemic in mathematics), the result is seldom something that an
>uninitiated reader can tackle without help.

What uninitiated reader can tackle C?

The trap is to include all the irrelevant detail. Then implementors begin
thinking this is requirement and build what what may have been an irrelevant
comment into the compiler and thereby formalise somebody's whim. Do all
member of union HAVE to have the same offset? Is that part of definition
or is it one of the non-essential words that might help the reader make sense
out of the formalisms?

I don't know what your mathematic experience is, but I do know the concise
and formalism in math is not the math. It's just a way to keep people
honest. It forces the writer to state his conditions and reasoning explicitly
and openly so that other people can critically examine it for flaws.

Come to think of it, maybe there is a reason why nobody likes formal language
definitions.

>                                             Compare this to the appendix
>of K&R, or even the X3J11 drafts, which can be heavy going at times but
>*can* be understood without formal background.

And misunderstood. Read this group. Read your own postings. The people
who don't hedge their answers are the ones who get ripped to pieces.
 
>The attitude that "anyone but an illiterate peasant ought to be able to
>read formal definitions" is part of the problem, not part of the solution.

I'd hate to have an illiterate peasant at a keyboard, although perhaps that why
Unix smells. An unwillingness to face that this is an intrinsically difficult
occupation is part of the problem, not part of the solution.

Additionally, not everybody need read the formal definition. There's a lot
finite math floating around in programming, but how many of us have read
Principia Mathematica? I certainly have no objections to people writing
tutorials and overviews and reference manuals and everything else. I do
object these being purported as THE definition of a language. A language
definition should be complete and concise, sitting on a dusty shelf somewhere.
It is the contract between users and compilers on what is or is not acceptable
and what it must mean. 

>MSDOS is not dead, it just     |   Unix still twitches as well.
>smells that way.               |

By the way, there's a Sampler of Formal Definition in an old, old Computing
Survey. The closing remarks remain relevant.

henry@utzoo.uucp (Henry Spencer) (08/11/88)

In article <1187@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>Quite true.  As long as we refuse to admit that unreadability is a major
>>problem that greatly reduces the usefulness of formal standards, they will
>>continue to see little use.
>
>The easy retort is that no formal language is readable or accessible. That
>includes production systems, PL/I, C, Algol, or a hexidecimal dump.

Precisely, which is why none of these methods should be used for a document
that one hopes will be widely read.  Even within such languages, however,
there are gradations of familiarity and accessibility, and the techniques
used in formal definitions are right over at the unfamiliar/inaccessible
extreme.

>>                             Even if the authors of a formal definition don't
>>fall into the trap of "lapidary" writing (paring out every non-essential
>>word that might help the reader make sense out of the formalisms -- a style
>>which is endemic in mathematics), the result is seldom something that an
>>uninitiated reader can tackle without help.
>
>What uninitiated reader can tackle C?

I know a number of people who learned C straight from the reference manual,
back in the days when there basically was no alternative.  Please don't
nit-pick; when I say "uninitiated", I'm not talking about an eight-year-old,
but about a reasonably competent programmer.  Such a person can read a
language definition written in English, but most such people aren't
familiar enough with formal-definition methods to answer a question like
"do I have to make sure that function parameters are not aliased?" from
a formal definition without days or even weeks of study.

>I don't know what your mathematic experience is, but I do know the concise
>and formalism in math is not the math. It's just a way to keep people
>honest...

It is also a booby-trap which has been loudly criticized, even by prominent
mathematicians, for making the literature of the field useless to anyone
but narrow specialists.  Concise formalisms are necessary but not sufficient.

>>... the X3J11 drafts, which can be heavy going at times but
>>*can* be understood without formal background.
>
>And misunderstood. Read this group. Read your own postings...

Formal definitions are not at all immune to many kinds of misunderstandings.
The Peano axioms, no, but a real language definition, yes:  it shares the
problems of massive detail and (unless accompanied by detailed explanatory
material) inadequate explanation and cross-referencing.  God may find formal
definitions of programming languages to be clear and simple; humans do not,
at least not without studying them repeatedly and at length.

>>The attitude that "anyone but an illiterate peasant ought to be able to
>>read formal definitions" is part of the problem, not part of the solution.
>
>I'd hate to have an illiterate peasant at a keyboard, although perhaps that why
>Unix smells. An unwillingness to face that this is an intrinsically difficult
>occupation is part of the problem, not part of the solution.

Intrinsically difficult, maybe.  But it is made much worse by people who
write their language definitions to be read by God, not humans.  Such
people do tend to appeal to the intrinsic difficulty of the problem to
excuse their inability, or unwillingness, to make concessions to the
limitations of their readers.  This is precisely the attitude that I was
criticizing.

>Additionally, not everybody need read the formal definition...  A language
>definition should be complete and concise, sitting on a dusty shelf somewhere.
>It is the contract between users and compilers on what is or is not acceptable
>and what it must mean. 

In other words, it is the sort of document that should be readable to the
users and the compiler writers, so they can understand the terms of the
contract without needing a lawyer to (mis)interpret it.

I would, actually, support a style of language definition in which there
was a formal definition accompanied by a *complete* informal exposition
of the definition, which had been carefully checked for consistency with
the formal one so that it could be used as a reference by the hoi-polloi,
and which was understood to be the partner of the formal definition rather
than an unwanted child of it.  (This means, for example, that a discrepancy
between the two is not automatically resolved in favor of the formal one.)
But that is hard work, and the people who do formal definitions usually
can't be bothered making their gleaming creations understandable to the
unwashed masses who have to use them.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

smryan@garth.UUCP (Steven Ryan) (08/13/88)

>nit-pick; when I say "uninitiated", I'm not talking about an eight-year-old,
>but about a reasonably competent programmer.  Such a person can read a
>language definition written in English, but most such people aren't

Again, look at everything else here. Well, perhaps we can read K+R, but
do we agree on what we read?
 
>It is also a booby-trap which has been loudly criticized, even by prominent
>mathematicians, for making the literature of the field useless to anyone
>but narrow specialists.  Concise formalisms are necessary but not sufficient.

Because it is hard, therefore it should not be done.
It is hard to comment a program, therefore it shouldn't be done.

No direct response to assertion that is done to keep the writer honest and
perhaps that's why language designer avoid it.
 
>>And misunderstood. Read this group. Read your own postings...
>
>Formal definitions are not at all immune to many kinds of misunderstandings.
>The Peano axioms, no, but a real language definition, yes:  it shares the

Reminds me of the stories of end of last century. An intrinsically difficult
subject remains difficult no matter how it is presented. But formalisation
makes it possible to argue rationally over some point. The question can be
resolved by examining the arguments within the context of a public
definition.

Instead it's a question of let's run it on Sun or Vax or ... and see what
they give us.

>>Unix smells. An unwillingness to face that this is an intrinsically difficult
>>occupation is part of the problem, not part of the solution.
>
>Intrinsically difficult, maybe.  But it is made much worse by people who
>write their language definitions to be read by God, not humans.  Such

It is also made worse by people who write their language definitions for
eight year old illiterates instead of programmers.

By the way, had anybody every gotten PIC or EQN or TBL or even TROFF to
work right on the first try?

>I would, actually, support a style of language definition in which there
>was a formal definition accompanied by a *complete* informal exposition
>of the definition, which had been carefully checked for consistency with

Have you seen The Turing Programming Language? Its with a tutorial, an
informal reference, then context free syntax, context sensitive contraints,
and finally the formal semantics.

The problem however with maintaining complete consistency assumes the
same information is available in both forms. How many bookshelves would
be required for a complete and consistent C definition written in
accessible English?

henry@utzoo.uucp (Henry Spencer) (08/20/88)

In article <1214@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>It is also a booby-trap which has been loudly criticized, even by prominent
>>mathematicians, for making the literature of the field useless to anyone
>>but narrow specialists.  Concise formalisms are necessary but not sufficient.
>
>Because it is hard, therefore it should not be done.
>It is hard to comment a program, therefore it shouldn't be done.

Please re-read the last sentence of what I wrote.  It should be done, *but*
it is not the whole job.

>No direct response to assertion that is done to keep the writer honest and
>perhaps that's why language designer avoid it.

No direct response to assertion that making one's work useful requires doing
more than just a formal definition.

>... But formalisation
>makes it possible to argue rationally over some point. The question can be
>resolved by examining the arguments within the context of a public
>definition.

This assumes that the public definition is complete, correct, and self-
consistent.  That is not a trivial assumption, indeed it is a large and
somewhat dubious one, especially when considering existing (relatively
large and warty) languages like C.
-- 
Intel CPUs are not defective,  |     Henry Spencer at U of Toronto Zoology
they just act that way.        | uunet!attcan!utzoo!henry henry@zoo.toronto.edu