[comp.sys.apple] The C language

jperry@UNIX.SRI.COM (John Perry) (05/27/87)

     I have actually written about twice as much C code as Pascal or
Modula-2 code and so my criticism of C is not made solely at an "airy"
theoretical level.  
     You say that "If C is so terrible, then how do you explain so many
commercial software developers switching to it?".  I thought I answered
that in my previous memo i.e. that some languages have dirty details
that make them so desirable in a practical sense that programmer's
are willing to overlook even the grossest of structural defects.
     Turing and/or Modula-2 have failed to hold sway in the commercial
world for the very simple reason that they arrived after both C and
Ada and simply haven't had sufficient people to toot the horn for them.
That's all it is --- C got their first and, like the inertia of FORTRAN
programmers to learning a different language, C programmers don't want
to change either (and I know plenty of them who HATE C but just don't
want to make the effort to get out of their bind).  Why?  Well, as you
put it, C is "good enough" --- the typical sunk-cost outlook --- another
tool has to be 100% better to justify a changeover.
     C, when examined closely, is so full of inconsistencies and 
unnecessarily cryptic syntactical finesses that, and I will reiterate
my practical observation, that even good C programmers can be seen
consulting a C manual even after several years of practice!  I hate to
get bogged down in another "features" discussion but it seems I'm being
drawn in to it --- so here goes.
     First, inconsistencies in parameters of library calls.  Why is it
that, in an fprintf statement, the file is the first parameter but in
a putc statement it is the SECOND parameter?  And why, oh why, does C
have to be different than the rest of the world by making the destination
string be the FIRST parameter in string subroutines such as strcat?
Oddly enough, in the original Kernighan and Ritchie C book, their version
of strcpy has the destination string as the second parameter on page 100
--- why the change to the counterintuitive in commercial C?
     Second, inconsistencies in rules for structured data types.  Can
anybody give me a good reason why arrays cannot be assigned in assignment
statements as can structures?  Another artificiality that one must keep
in mind.
     Third, C's allowance of mixing pointer and array types leads to
abominably unreadable code.  In some UNIX sources I have seen such
mixtures leading to oddities like a[3] meaning "the third value after
where the pointer to a is currently pointing" rather than simply the
fourth value of the array.  Try reading this kind of stuff and keeping
your sanity.
     Fourth, the simple C data declaration:

          int   *pi;

gives you no clue whether pi points to a SCALAR integer or an ARRAY
of integers.  This kind of ambiguity does not exist in Modula-2
or Pascal where the declaration of pi would tell us whether we have
a pointer to a scalar or structured data type.
     Fifth, C's rules for initializing character arrays, especially
"ragged" arrays of variable length characters, seems to differ from
implementation to implementation.  
     Sixth, C allows an unreadable degree of programmer cleverness
as in such code segments as:

     strcpy(s,t)
     char *s,*t;
 {
    while(*s++ = *t++);}

Which relies on the artifact that a string terminator happens to be an
ASCII zero and that, since the value of an assignment statement is the
value of the left hand side, the while will terminate when NULL is
encountered.  C moguls actually ENCOURAGE this kind of "idiomatic"
expression e.g. Kernighan and Ritchie -- "Although this may seem cryptic
at first sight, the NOTATIONAL CONVENIENCE (my emphasis) is considerable,
and the idiom should be mastered ...".  
     By notational convenience they mean THAT IT CAN BE TYPED IN
QUICKLY!!!  And, of course, the "idiom should be mastered" if one is
to enter the pantheon known as "C cognoscenti" --- God forbid if
one's C code looks like that of a Pascal programmer!!
     I could rant on and on about the poor human engineering of the C
language (how many times have you gotten caught on  if a = c instead
of if a == c  or  *p++ versus (*p)++ ??) and get even further bogged
down in the "features" quagmire.  But to what end??  
     The "bottom line" of my complaints about C is that it is a
poorly engineered language which CANNOT be improved by continually
adding "features".  Its features are counterintuitive, poorly
human engineered in that they invite error, and literally beckon
the kind of cryptographic, "clever" code which seems to be ENCOURAGED
by its leading proponents.  On the other hand, I feel that the
basic skeleton structure of a language like Modula-2 is so sound that
the addition of precious few features and a couple minor language
changes would create a nearly ideal programming tool.
     But, it'll probably never happen because the artificialities
and ambiguities of C create the sort of mystique about the "difficulties
of programming" that programmers love.  Next to chess players, their
egos are the most insufferable.




                                                   John Perry

gwyn@BRL.ARPA (Doug Gwyn, VLD/VMB) (05/28/87)

> From: John Perry <jperry@sri-unix.arpa>
>      I have actually written about twice as much C code as Pascal or
> Modula-2 code and so my criticism of C is not made solely at an "airy"
> theoretical level.  

That doesn't follow at all.  Your criticisms amount to complaints that C
does not meet some ideal model you have in mind for a programming language.
As will be shown, your understanding of C is superficial, and evidently
hampered by your prejudices.  I'm not objecting to that in itself, but to
the fact that you are making recommendations based on such views.

>      You say that "If C is so terrible, then how do you explain so many
> commercial software developers switching to it?".  I thought I answered
> that in my previous memo i.e. that some languages have dirty details
> that make them so desirable in a practical sense that programmer's
> are willing to overlook even the grossest of structural defects.

I see.  It's not utility you seek, but unusable structural "perfection".
The things that make C a better language for practical work you call
"dirty details", as if attention to such matters weren't fundamental.
We're about to investigate what you apprently identify as these "gross
structural defects" (of course your favorite language couldn't possibly
have any defects, could it -- they'd just be "minor imperfections").

> ...  Well, as you
> put it, C is "good enough" --- the typical sunk-cost outlook --- another
> tool has to be 100% better to justify a changeover.

That's not an accurate assessment.  It is a waste of time to design yet
another programming language that has comparable facilities to existing
ones, unless it is clearly superior in several significant respects
(which has not been demonstrated in this debate).  One sufficiently good
language of each general type is plenty.  Programming languages are TOOLS,
not ENDS IN THEMSELVES.  (Leaving aside educational considerations.)

>      C, when examined closely, is so full of inconsistencies and 
> unnecessarily cryptic syntactical finesses that, and I will reiterate
> my practical observation, that even good C programmers can be seen
> consulting a C manual even after several years of practice!

The only thing I've seen people whom I would consider to be good C
programmers look up (infrequently) about the language is the precedence
of operators in expressions, in cases involving combinations of bitwise,
shift, conditional, and assignment operators.  (Some programmers would
simply use extra parentheses to be sure, rather than look up the
precedence.)  Most programming languages don't have this problem because
they're not capable of supporting such expressions in the first place.

I will admit, as will any good C programmer, that the bitwise operators
would have been more conveniently given higher precedence than the
equality operators.  We're used to it, though, and again many languages
don't have that problem because they force more cumbersome expression
in the first place.

Occasionally one will look up the order of arguments to one of the vast
number of infrequently-used functions in the C library.  Generally one
needs to check the functional specification anyway at that point.  That
would apply equally to any language.  (Keyword parameters are only
useful if you remember everything about the interface except parameter
order; they might be a useful addition to C some day if a good syntax
could be devised (name=value already has another meaning).)

>      First, inconsistencies in parameters of library calls.  Why is it
> that, in an fprintf statement, the file is the first parameter but in
> a putc statement it is the SECOND parameter?

Because putc(c,f) was considered to be the natural way to order "put
character to file".  The printf() family have to support variable
parameter lists, and that consideration forced the parameters to come
last.  At least C HAS variable parameter lists!

>  And why, oh why, does C
> have to be different than the rest of the world by making the destination
> string be the FIRST parameter in string subroutines such as strcat?

I tried explaining this by analogy with dst:=src, but you seem to have
missed the point.  It could have been done either way, just so long as
the str*() functions are consistent in this (which they are).  Another
response would be, your concept of the "rest of the world" is seriously
limited in scope.  Why isn't Modula-2 just like Fortran in everything it
does, hmm?  Isn't that what the "rest of the world" does?

> Oddly enough, in the original Kernighan and Ritchie C book, their version
> of strcpy has the destination string as the second parameter on page 100
> --- why the change to the counterintuitive in commercial C?

I knew already you were a C illiterate; this proves it.  K&R's strcpy()
is compatible with the one in the standard C library.  In fact, they
explain why the parameter order is the way it is, in much the way I did.
Or maybe you don't read English either?

Again, what you consider "intuitive" is not necessarily universal.  If
I had no prior experience with dst:=src or MOV DST,SRC then I admit I
would have a slight inclination toward strcpy(src,dst), and I normally
design function interfaces with input parameters followed by output
parameters, but the other way around doesn't bother me either.  Why
is this so important?  Do you program by trying to GUESS how things
work instead of LEARNING how things are used?  Maybe it's a good thing
that C does bite the sloppy craftsman; it might lead to improved work
habits.

>      Second, inconsistencies in rules for structured data types.  Can
> anybody give me a good reason why arrays cannot be assigned in assignment
> statements as can structures?  Another artificiality that one must keep
> in mind.

Yes: the name of an array in C is a pointer to the first element of the
array.  You CAN assign the name of an array to an appropriate pointer
variable.  There simply is no syntax for specifying assignment of the
entire array.  The memcpy() function can of course be used for this when
it is necessary to copy an array into another.  In C, that is an
infrequent situation, because pointers to objects are so easy to use
that whenever possible experienced C programmers will copy pointers,
not the objects they point to.  After all, that's more efficient and C
was specifically designed for use in efficient systems programming.

C's identification of array names with pointers has been extremely
useful.  It unfortunately had the side-effect of making arrays themselves
"second-class" data objects with respect to some operations, but then
virtually all languages do this.  (You cannot meaningfully divide scalars
by arrays, for example.)

>      Third, C's allowance of mixing pointer and array types leads to
> abominably unreadable code.  In some UNIX sources I have seen such
> mixtures leading to oddities like a[3] meaning "the third value after
> where the pointer to a is currently pointing" rather than simply the
> fourth value of the array.  Try reading this kind of stuff and keeping
> your sanity.

Experienced C programmers virtually never access the nth element like
this for n other than -1, 0, or 1.  Try reading the equivalent operation
(with a pointer!) in other languages and see if you're saner.  The
reason for the parenthetical qualification in the previous sentence is
that pointers are often the natural way to access objects in C, unlike
Pascal for instance where array indices are easier to use in practically
all cases.  C use of array indexing would look much like Pascal's, but
often that is not the natural way to effectively exploit C's facilities.

>      Fourth, the simple C data declaration:
> 
>           int   *pi;
> 
> gives you no clue whether pi points to a SCALAR integer or an ARRAY
> of integers.  This kind of ambiguity does not exist in Modula-2
> or Pascal where the declaration of pi would tell us whether we have
> a pointer to a scalar or structured data type.

In your example, `pi' is a pointer to an (int) object, just as the
declaration says.  The object might be one of several organized as an
array, or it might not.  A pointer to an array would be declared as
	int (*pi)[];
where the array dimension may optionally be specified (in such cases,
the only dimensions that must be specified in C are the ones necessary
to determine how to locate an array entry).

I realize you probably intended the point addressed by the second
sentence in my paragraph above.  Why is that a problem?  It follows
naturally from the very language structure that makes x[y] == *(x+y);
again your complaint seems to be that C uses different abstractions
from the ones you're happy with.  That is simply a matter of
upbringing and personal taste, unless you can demonstrate objectively
that one alternative is superior on all counts of real-world value,
which I doubt very much you can do.

>      Fifth, C's rules for initializing character arrays, especially
> "ragged" arrays of variable length characters, seems to differ from
> implementation to implementation.  

You want standards, we're about to publish one.  There's no ambiguity
in the standard language specification about this.  The only possible
ambiguity in current non-buggy implementations would concern whether
or not to diagnose the following as an error:
	char s[2] = "xy";
(since the initializer seems to require space for its NUL terminator).
This is a rather rare situation.  Otherwise, the rules are quite clear.

>      Sixth, C allows an unreadable degree of programmer cleverness
> as in such code segments as:
>      strcpy(s,t)
>      char *s,*t;
>  {
>     while(*s++ = *t++);}
> 
> Which relies on the artifact that a string terminator happens to be an
> ASCII zero and that, since the value of an assignment statement is the
> value of the left hand side, the while will terminate when NULL is
> encountered.  C moguls actually ENCOURAGE this kind of "idiomatic"
> expression e.g. Kernighan and Ritchie -- "Although this may seem cryptic
> at first sight, the NOTATIONAL CONVENIENCE (my emphasis) is considerable,
> and the idiom should be mastered ...".  

First of all, use the right whitespace, comments, etc., as in the
reference you gave but did not understand:

	strcpy(s, t)	/* copy t to s; pointer version 3 */
	char *s, *t;
	{
		while (*s++ = *t++)
			;
	}

Next, note that C strings are terminated by 0-valued (char)s; this is
true no matter what the target character set -- ASCII has nothing to do
with it.  It does not "happen" by accident but is a key feature of C
character strings directly supported by the language (one can define
count+data style strings etc. himself if desired).

By the way, nobody I know of (including K&R) recommends using NULL as
a symbol for anything in C other than a null pointer, which is not the
same as the string terminator (invariably written as '\0' or simply 0).

The rest of the quotation, which you elided from your biased summary,
is "..., if for no other reason than that you will see it frequently
in C programs."  This is a perfectly valid point for a text teaching
C to make, and the implication is that one might want to consider
whether one should write code like this oneself.  As a matter of fact,
many experienced C programmers (including myself) would show the
comparison against the 0-terminator explictly.  We would probably
change other details of the strcpy() implementation, for which K&R
showed four different versions, of which you selected the one that is
specifically intended to help explain one of the more obscure commonly-
encountered idioms.  Note that K&R do NOT use that style in subsequent
examples.  This hardly constitutes "encouragement".

>      By notational convenience they mean THAT IT CAN BE TYPED IN
> QUICKLY!!!  And, of course, the "idiom should be mastered" if one is
> to enter the pantheon known as "C cognoscenti" --- God forbid if
> one's C code looks like that of a Pascal programmer!!

I'm not sure what a "Pascal programmer"'s code looks like.  (I thought
it varied from programmer to programmer, depending on training,
experience, thoughtfulness, organizational standards, etc.  Funny that
C should be considered immune from these sources of variation.)
Certainly my C code doesn't look like the picture you try to paint of
code of "C cognoscenti" (which I take to mean "those who know what
they're doing when it comes to C" -- is that meant to be derogatory?).

>      I could rant on and on about the poor human engineering of the C
> language (how many times have you gotten caught on  if a = c instead
> of if a == c

Never.

>  or  *p++ versus (*p)++ ??)

Never.

> and get even further bogged
> down in the "features" quagmire.  But to what end??  

C deliberately has few "features", in the sense of unnecessary frills.
It does have "features" in the sense of useful facilities for doing
certain types of jobs effectively.  It is by no means one of the "big"
languages (PL/I, Ada, etc.)

> ...  On the other hand, I feel that the
> basic skeleton structure of a language like Modula-2 is so sound that
> the addition of precious few features and a couple minor language
> changes would create a nearly ideal programming tool.

Yes, one gathers you believe that.  Unlike you, I'm not about to
criticize a language for attributes I don't fully appreciate or
because it isn't just like my favorite language.

>      But, it'll probably never happen because the artificialities
> and ambiguities of C create the sort of mystique about the "difficulties
> of programming" that programmers love.  Next to chess players, their
> egos are the most insufferable.

How long have you had this complex?

Seriously, C was designed by software developers for use by software
developers in a particular unregulated type of development environment.
If the environment had been like those favored by structured design
methodologies (which were barely in their infancy when C was designed),
C very likely would have "package"-like facilities rather than the more
simple (and more universally supported by linkers) linkage it provides.
C was never intended to be a Beginner's All-purpose Symbolic Instruction
Code, nor a teaching tool (Pascal), nor something intended to be read by
business managers (COBOL), nor a replacement for myriads of specialized
languages for embedded systems (Ada).  The real reason professional
programmers are jumping on the C bandwagon is that C was designed for use
by professional programmers; they find it an effective tool for developing
applications.  This is no more a "mystique" than a carpenter's preference
for tools that an unskilled layman would have a hard time using properly.

If YOU PERSONALLY don't happen to like C, you don't have to use it.
However, it is morally and ethically WRONG to make recommendations to
people who might be looking for advice when you can't do so fairly.
In this context, fairness would require that you evaluate the languages
in comparable terms -- to state that one of them has serious structural
flaws while the other is nearly perfect is simply not true and might
mislead the innocent (of which I'm fortunately not one).

samples@RENOIR.BERKELEY.EDU (A. Dain Samples) (05/28/87)

From samples Thu May 28 10:27:07 1987
Received: by renoir.Berkeley.EDU (5.57/1.25)
	id AA27322; Thu, 28 May 87 10:26:53 PDT
Date: Thu, 28 May 87 10:26:53 PDT
From: MAILER-DAEMON (Mail Delivery Subsystem)
Subject: Returned mail: User unknown
Message-Id: <8705281726.AA27322@renoir.Berkeley.EDU>
To: samples
Status: R

   ----- Transcript of session follows -----
>>> RCPT To:<com-sys-apple@ucbvax.berkeley.edu>
<<< 550 <com-sys-apple@ucbvax.berkeley.edu>... User unknown
550 com-sys-apple@ucbvax... User unknown

   ----- Unsent message follows -----
Received: by renoir.Berkeley.EDU (5.57/1.25)
	id AA27318; Thu, 28 May 87 10:26:53 PDT
Date: Thu, 28 May 87 10:26:53 PDT
From: samples (A. Dain Samples)
Message-Id: <8705281726.AA27318@renoir.Berkeley.EDU>
To: com-sys-apple@ucbvax
Subject: Re:  The C Language

This debate/argument over language characteristics -- which is
better/best, which is dirty/clean -- is very much like two engineers
engaged in building a three-mile bridge arguing over the brand of
blue-print paper they use.

To say that languages survive and are used because they were there
first and are therefore entrenched is only trivially true.  There are
two points to be made, each directed at two different users of
programming languages.

The first class of users are those that enjoy programming: they enjoy
the puzzles to be solved, they enjoy seeing ``Hello, world!'' come up
on a screen when programmed in an about-to-be-learned language, they
experience satisfaction when any set of programming sentences have been
successfully turned into a working program.  These people then often
confuse cause-and-effect, or means-and-ends: ``I enjoyed that!  And I
was programming in language X, therefore language X must be pretty good
to give me that kind of enjoyment.''  That is why we see religious
arguments over any and every programming language, from FORTRAN, to C,
to APL, to FORTH, to SNOBOL, to Algol, to Pascal, to LISP, to C++, to
... need I go on?

Let us not ignore the effect of the language, however.  Some languages
are more fun to program in than others, and I agree with those who say
C is a ``fun'' language: it certainly is a never ending source of
surprise, it provides plenty of opportunity for puzzle-solving, and it
is a clever implementation of some pretty basic programming principles.  

That is, I think it is fun until I try to get a serious, large project
completed.  Then frustration sets in.  But this brings me to the second
class of programmer: those trying to bring a serious, large programming
effort to successful completion.  The primary reason that such people
use all of the ``bad'' languages, and the reason that people using the
``better'' languages don't seem to do any better than the users of
``bad'' languages is

	THE PROBLEMS THAT PROGRAMMING LANGUAGES SOLVE ARE ORTHOGONAL TO
	THE PROBLEMS THAT MUST BE SOLVED IN LARGE PROGRAMMING
	PROJECTS.

In other words

	THE PROGRAMMING LANGUAGE USED DOESN'T MATTER IN LARGE
	PROGRAMMING PROJECTS.

Obviously, these are overstatements designed to make a point; in
reality, the problems are almost orthogonal, and language choice
usually doesn't matter.  Again, in other words, the features of a
particular programming language do not attack the problems of large
systems.  That is why the Shuttle software is written in FORTRAN (there
was a CACM article in the last couple of years on this).  This is why
large financial programs are still written in COBOL.  And that is why
DARPA is investing millions of dollars STILL, after all these years, in
how to develop large software systems (there seems there is this
project the President wants to do that requires several million
[billion?] lines of code).

Yes, translation and investment costs come into play, but only as a
second order effect.

The primary problems to be solved in any large system are communication
and consistency.  But not JUST between programming units, but also
between programmers, programming teams, managers, managers' bosses,
users, applications engineers, salesmen, etc. etc. etc.  And I
guarantee you, that solving these problems for PEOPLE is far worse than
solving them for PROGRAMS!  So, the choice of a language is swamped by
the other problems to be solved, and it really doesn't make that much
difference in the global scope of things.  It makes some difference,
granted, but not nearly as much as the lowly programmer having to work
with the language might think.

Develop a language that solves the programming-in-the-large problems, and
THEN we can have a meaningful, resolvable argument!

Summary: ANY programming language offers its form of fun, and has its
adherents.  But any arguments about one language being better than
another make sense ONLY when we are talking about programming-in-the-
small (and even then they are religious arguments, as well they should
be).  But when it comes to serious, large, important programming
projects, the decision of which language to use is not made until so
many other more important decisions have been made, that it almost
doesn't matter.

I mean, come on, do bridge engineers start a design meeting by saying,
``Well, what brand of blue-print paper shall we use on this project?''

In the spirit of spirited debate,

Dain

gwyn@brl-smoke.UUCP (05/29/87)

In article <8705281733.AA27424@renoir.Berkeley.EDU> samples@RENOIR.BERKELEY.EDU (A. Dain Samples) writes:
>Develop a language that solves the programming-in-the-large problems, and
>THEN we can have a meaningful, resolvable argument!

Thanks for a good article.
There is a widespread misconception that the proper programming
(or meta-programming) language would solve all our problems.
In fact, problems are solved by people thinking about them and
getting good ideas, not by mechanical methods (except for a few
especially boring classes of problems, or as AIDS for people
solving problems).
Programming language design can assist or hinder development of
proper computer solutions, but as you observe that's not the
really hard part.

sloan@pnet02.CTS.COM (Steve Smythe) (05/31/87)

Here here... C is a wonderful language, but people should be aware of the fact
that because of C's wonderful "features", programmers have to adopt to these
practices..

Since there are so many of us that seem to have Apples and C compilers, why
not start up a group?  I'm in for posting tons of source code for those who
are looking for portable code...

Sloan

UUCP: {akgua!crash, hplabs!hp-sdd!crash}!gryphon!pnet02!sloan
INET: sloan@pnet02.CTS.COM

ranger@ecsvax.UUCP (Rick N. Fincher) (06/01/87)

I'd like to interrupt here to thank Doug for his Posix postings to the net
it's poeple willing to do things like this that make the net so useful.
I was just getting ready to sit down and write code to do what Doug's
programs do (ie read Prodos directories in "C") when he posted his
programs.  This will be a great help to a lot of folks and it comes at
an ideal time.  With Apple pushing C as the development language for the
//gs, it is a good time to set some standards.

Thanks Doug

Rick Fincher
ranger@ecsvax

bdw@peaks.UUCP (06/01/87)

In article <575@gryphon.CTS.COM>, sloan@pnet02.CTS.COM (Steve Smythe) writes:
> Here here... C is a wonderful language, but people should be aware of the fact
> that because of C's wonderful "features", programmers have to adopt to these
> practices..
{akgua!crash, hplabs!hp-sdd!crash}!gryphon!pnet02!sloan writes:
> 
> Since there are so many of us that seem to have Apples and C compilers, why
> not start up a group?  I'm in for posting tons of source code for those who
> are looking for portable code...
> 
> Sloan

There are lots of unix-like tools which could make the PRODOS, and especially,
the DOS envirnment more useful and efficient to developers and hackers using
C if they (the tools) existed. However, why not share this stuff through
comp.sys.apple and/or comp.sources?

		Bruce
		as hao!boulder!peaks!bdw