[comp.sys.amiga] Leo's ANSI C Flame

rmeyers@tle.dec.com (Randy Meyers 381-2743 ZKO2-3/N30) (06/30/88)

Leo recently posted an admitted flame about certain optimizations in
Lattice C V4.0 and the general state of ANSI C.  I believe that Leo
may have been misinformed about some of these subjects.

Leo's posting began by complaining about an optimization in the latest
Lattice C compiler.  Lattice C V4.0 will compile x = strlen("abcdefg")
into an instruction that moves seven to x.

Leo complains:

>	You realize, of course, that this kind of optimization falls flat on
>its face if I somehow manage to change the contents of the memory that
>contains "abcdefg".  I could stuff a \0 where the 'd' is, and the program
>would not notice.

You are correct.  However, such a program is clearly poorly written.  Think,
Leo!  Would you really want to support such a program?  Would you be proud
that you can written it?  Do you think that it helps program clarity if
any constant in the program may have its value changed?

Kernighan and Ritchie never guaranteed the string constants were modifiable.
It was an accident of early implementations that string constants could
be modified, and a very few programmers came to rely on it (probably
again initially by accident).  Note that there is no reason to have
modifiable string constants in the language.  Any program that takes
advantage of modifiable string constants can be rewritten to use:

	static char modifiable[] = "abcdefg";

and take no extra time, no extra space, make it clear to the reader that
the value is not a guaranteed constant, and just be better written.

By the way, the ANSI standard does NOT require that character constants
be read only.  It says that "If a program attempts to modify a string
literal..., the results are undefined."  This is standard jargon for
saying some implementations may write lock constants; some may not.
Your program isn't portable if it depends on this (mis-)feature.  If you
care, only buy compilers from people who agree with your opinions.

>Further, the type returned by strlen() is not *guaranteed* to be an int.
>I could have written one that returns a short; where would that leave you?

Under ANSI C (and Lattice, if it follows the ANSI rules on such things), the
above strlen optimization is only legal if the user is using the real strlen.
The way that the compiler "knows" you are using the real strlen as opposed
to some strlen that you wrote is by the declaration of strlen.  The rule
boils down to "if you got the magic definition of strlen from the proper
include file, the compiler is free to know lots of extra information
about the function and to perform additional optimizations.   If you
provide your own definition of strlen, the compiler must use it and
play dumb."

This magic that happens when you include the proper include file is
this:  ANSI permits a standard include file to contain macros that
are synonyms for standard functions in addition to the normal extern
declarations for the functions.  These macros might generate inline
code instead of calling a function (many pre-ANSI versions of C use
this trick for getc and getchar) or might call a builtin function
that the compiler supports as an extension.

For example, string.h might include:

extern int strlen(const char *);	/* Required by ANSI */
#define strlen(s) _STRLEN(s)		/* Optional, permitted by ANSI */

What the first line does is declare the strlen function that has
always been a part of C.  ANSI C requires every implementation of
C have in its library a function called strlen that does what you
expect.  (The only change here is ANSI C also specifies the
argument type of the function.)

The second line is optional.  It defines a macro to be expanded
when a normal call to strlen is found.  However, the macro does
not do its work by calling the library routine strlen, it uses
the compiler extension _STRLEN to do the work.  The _STRLEN can
either try and determine the result at compile-time, try and generate
code in-line to compute the result, or just give up and call
the library routine.

Note that this is pretty invisible to the programmer.  He only gets
the special _STRLEN function if he includes the proper .h file.  If
the programmer includes the file but does not want to take advantage
of the builtin _STRLEN, he can do:

	#ifdef strlen
	#undef strlen
	#endif

and not be bothered by it.  Even if he doesn't do the #undef, he can
call the library by writing the call as:

	x = (strlen)("abcdefg");

since macros that take arguments do not expand if the next token after
the macro name is not a open parenthesis.  A programmer can even take
the address of the function without worry because of the same rule:

	f = strlen;		/* get a pointer to strlen */

Note that the ANSI standard requires that a programmer be able to
avoid this fancy builtin stuff through the methods I stated.  Although,
in general, programmers need not try to get around this stuff.  Except
for very bizarre programs (like programs that assume that constants
aren't, but are compiled with compilers that assume constants are),
everything works the same.

In part of your argument is the assumption that a programmer is free
to provide his own versions of any standard routine.  This assumption
is in error.  It sometimes works, and it sometimes doesn't.  The ANSI
standard does not really change traditional practice here.

You can provide your own routines if you do not include the standard
header file declaring the function and your make your replacement a
static (non-global) routine.  This has always been true--ANSI doesn't
change it.

What the draft ANSI C standard says about making your own extern function
or variable with the same name as a standard one is "If a program defines
an external identifier with the same name as a reserved external identifier,
even in a semantically equivalent form, the behavior is undefined."  Again,
this is standards jargon saying that it may work, or it may not.  If you
care, only give your money to a compiler writer whose prejudices match
yours.

This is not a change in traditional C practice.  Although, a lot of
misguided people think that this was formally permitted because it does
work much of the time.  Ok, let's assume that you want to write your own
version of strlen that returns a short (assume sizeof (short) is 2)
instead of the standard strlen returns unsigned int (assume sizeof
(unsigned int) is four).  You write some test programs, they all work
fine.  Now you write a program that uses your short strlen and calls
printf.  Guess what, unknown to you, the version of printf that comes
with your compiler calls strlen on string arguments in order to determine
the size of buffers it needs.  Assume that printf now gets horribly wrong
answers from your strlen because if picks up two bytes of garbage along
with the two bytes of result.

Maybe you luck out.  Maybe printf doesn't call strlen.  But you can
probably break just about EVERY C implementation by randomly changing
some of the library functions out from underneath it.  (Does printf
depend on puts? calloc? write? ferror? stdout? fprintf?)  Try
it on our favorite C implementation.  Call up the developer.  Tell
him what you find.  You'll probably get some reply like, "Gosh,
your right.  If you want to rewrite puts, you should also rewrite
printf as well.  Have you looked into buying the source for the
library?  It will make your job easier."

The ANSI standard includes that bit about "semantically equivalent"
to cover two other facts of life.  First, your may think you have
provided a "plug-compatible" version of the routine, but failed in
some needed nuance.  For example, some implementations of malloc
have the property that if you allocate a chunk of memory, free it,
and reallocate it, the original data you stuffed into the memory
will still be there.  I have heard of code that makes use of this
"feature."  Suppose that your malloc doesn't do this, but your
compiler's version of printf requires it.  The other fact of life
is that sometimes several C library functions will end up in the same
module.  Assume that if the linker brings in calloc from the library,
the entry point for malloc is dragged in as well.  If you wanted
to replace malloc with your own routine, but wanted to use the
standard calloc, you will get multiple definitions of malloc
when you link.

All of the above is a fact of life today WITHOUT the ANSI Standard.  The
ANSI Standard actually improves the situation somewhat.  The ANSI
standard does "reserve" the traditional C library names, but it limits
the standard functions to only depend on other standard functions or
to names that begin with underscore.

When I first got my Amiga and Lattice C V3.10, one of the first programs
I tried to build was Wecker's VT100.  It compiled and loaded without errors,
but it would die horribly just after starting.  I eventually tracked
down the bug.  The Lattice fopen function called another (new to V3.10)
Lattice function called dopen.  Wecker had a dopen function in his program
that did something entirely different.  When fopen called dopen, and
entered the Wecker version, not the Lattice version, the program would
die.

This is a problem that has always haunted C, no one said you couldn't have
some standard library routine call some non-standard entry point.
The problem doesn't turn up too often because most standard library
functions can be written using only calls to other standard functions
or to system specific functions with really weird names (_WRITE,
SYS$QIO, $#%&*OUT...).  But occasionally the problem occurs.  Under
the ANSI standard, the problem is outlawed.  If Lattice C had been
standard conforming, the VT100 program would have worked.

So, the ANSI standard doesn't make the situation any worse when it
comes to you writing replacements for standard functions, and it
makes the situation better when it comes to making sure that
standard functions don't tromp all over your functions.

>	You further realize, of course, that no respectable programmer would
>ever write:
> 
>	strlen ("abcdefg");
> 
>	But would instead use (if he really *had* to):
> 
>	sizeof ("abcdefg") - 1;
> 
>	If the code is written by d*psh*ts, it is *not* the responsibility
>of the compiler vendor to save their butts.

Leo, write a macro that takes two arguments.  The first argument is
the name of a struct that has two members, len and ptr.  The
second argument to the macro is a pointer a string.  The macro does
two things: it sets the len member to the length of the string
and the ptr member to the address of the string.  Here's my
answer:

	#define DESC(d, string) (d.len = strlen(string), d.ptr = string)

I actually had to use a similar macro recently.  Look at what happens
when I make a call of the form DESC(d, "abcdefg").  The point here is
that there is no such thing as an optimization for a d*psh*t case.
Experience has shown time and time again that optimizations for what
looks like stupid code are valuable.  Stupid code comes up because
people use macros, because the compiler itself may generate it, or
because powerful optimizations may reduce complex code to a simple
case.  For example:

	register char *p;

	p = "abcdefg";

	/* 100,000 lines of code that don't modify p */

	DESC(d, p);

A reasonably good compiler will prove that p's value has not been changed
since the initial assignment, and will transform the call into
DESC(d, "abcdefg"). With Lattice's strlen optimization, this will boil
down into two moves, instead of a function call and two moves.

>Bloated code is, by and large, the responsibility of the guy who *wrote*
>it.  And if the programmer in question doesn't realize this, then s/he
>has no business writing code for public consumption.
 
As show above, bloated code is sometimes written by nobody--it just sort
of exists in the code written by the best of us.  If an automatic tool,
like an optimizing compiler, can get rid of it painlessly, it is a great
idea.

>'volatile' is a Good Thing.  Function prototypes are a Good Thing.

I agree.

>#pragma is of questionable value (largely because no one has adequately
>explained to me what it *does*!).

Simple:  pragma is a standard approved way to add extensions to the
language without adding new reserved words.  For example, Lattice uses
it in their standard headers in order to call ROM Kernal routines
directly without going through the stubs.  pragma is intrinsically
non-standard:  the ANSI standard states that it exists, mentions
some of the things that it can be used for, and leaves it alone.
Every compiler is free to develop pragmas and use any syntax that
they want after the word pragma.  A programmer who uses a pragma
should enclose it in #if--#endif:

	#if LATTICE
	#pragma Delete(R0,R1)  /* Means delete source file to MANX */
	#endif

I made up the above example.  Lattice's pragma don't look that way
and MANX, as far as I know, doesn't have pragma.

>Enforced parenthetical grouping whether or not it's necessary is Stupid.

Expression control is necessary, but I don't like enforced parentheses
either.  I preferred it when the new unary plus operator controlled
expression evaluation.  However, France threatened to veto the ISO
standard for C unless they got parentheses.  The enforcement only makes
a difference when doing floating point, one's complement math, or checking
for integer overflows.  Since most C implementations (and C programs) use
two's complement integer math with no overflow detection, it isn't a big
thing.

>Making string constants read-only is Stupid.

The ANSI standard doesn't.

>Breaking all the string functions and giving them cryptic names is Stupid.

I agree totally.  But, I don't think that has happened.  The traditional
functions with traditional meanings are around.  Send me mail with what
you think is specifically wrong.

To sum up:  There is a lot of misinformation about ANSI C.  If someone
has told you that all your code will break under ANSI C, either you
are a very poor programmer (and your code breaks every time you move it)
or you are being misinformed.  (The latter is very easy:  the ANSI standard
is written in formal style using certain conventions that make it hard
to decipher.  I have come across lots of misinformation about what
the standard says.)

----------------------------------------
Randy Meyers, not representing Digital Equipment Corporation
	USENET:	{decwrl|decvax|decuac}!tle.dec.com!rmeyers
	ARPA:	rmeyers%tle.dec.com@decwrl.dec.com

tim@amdcad.AMD.COM (Tim Olson) (06/30/88)

In article <8806292138.AA22025@decwrl.dec.com> rmeyers@tle.dec.com (Randy Meyers 381-2743 ZKO2-3/N30) writes:
| Kernighan and Ritchie never guaranteed the string constants were modifiable.
| It was an accident of early implementations that string constants could
| be modified, and a very few programmers came to rely on it (probably
| again initially by accident).  Note that there is no reason to have
| modifiable string constants in the language.  Any program that takes
| advantage of modifiable string constants can be rewritten to use:

Well, K&R say that string constants are type "array of characters", and
there is no such read-only restriction on this type.  In fact, they went
out of their way to allow such manipulation, because they declared that
*all* string constants, even when written identically, are distinct. 
This allows programmers to do things like

	name = mktemp("tempXXXXXX");

Leo's argument that a compiler that optimized and assigned 16 to x for

	x = strlen("constant string");

would break if he modified the string at runtime is incorrect, because
there is no way to get a legal pointer to the string in the above
expression ("constant string" is distinct from any other string
constant).
-- 
	-- Tim Olson
	Advanced Micro Devices
	(tim@delirun.amd.com)

rmeyers@tle.dec.com.UUCP (07/01/88)

Tim Olson, tim@delirun.amd.com, took exception to my statement that
Kernighan and Ritchie never guaranteed the string constants were
modifiable:

|Well, K&R say that string constants are type "array of characters", and
|there is no such read-only restriction on this type.

I don't find this argument too convincing.  K&R also says that the
constant 1L has type "long int."  That type doesn't have any read-only
restrictions.  Does that really imply anything?

|In fact, they went out of their way to allow such manipulation, because
|they declared that *all* string constants, even when written identically,
|are distinct.

That K&R promised that all string constants were distinct is not
sufficient to state that K&R promises that string constants can
be written.  It is fairly trivial to write a compiler that write
locks string constants but makes them all distinct.

K&R never promise that string constants can be modified.  They never
state it, and they show no examples of it.  Personally, I find it
interesting that they always refer to the beasts as "string constants"
and the ANSI standard always refers to them as "string literals,"
a term less prejudicial about whether the string can be modified.
I think that the ANSI standard is more willing to discuss the sordid
uses of string "constants" than K&R.

The real problem is here that K&R is not a formal definition of C.  It
leaves a number of questions unanswered (the reason that the ANSI C
committee exists is not to either improve or "screw up" the language:
it exists simply to provide a formal definition for it).  For example,
does K&R allow you to convert a pointer to a function to a pointer
to an int?  Does the existence of that capability imply that you
can not have execute only code because you care free to write memory
through int pointers?

|Leo's argument that a compiler that optimized and assigned 16 to x for
| 
|	x = strlen("constant string");
| 
|would break if he modified the string at runtime is incorrect, because
|there is no way to get a legal pointer to the string in the above
|expression ("constant string" is distinct from any other string
|constant).

Good point!
 
----------------------------------------
Randy Meyers, not representing Digital Equipment Corporation
	USENET:	{decwrl|decvax|decuac}!tle.dec.com!rmeyers
	ARPA:	rmeyers%tle.dec.com@decwrl.dec.com
 

blandy@marduk.cs.cornell.edu (Jim Blandy) (07/01/88)

Please, PLEASE don't let's get into a C debate about modifying string
constants.  I subscripe to comp.lang.c, and it's painful enough listening
to them argue about the value of NULL.  If I didn't get useful information
out of that group, I'd unsubscribe in a second.  If you want to debate
this, please move the discussion there.

You should give the new K&R a thorough read-through; they answer this
particular question anyway.
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatisit."  - archie

ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) (07/02/88)

[ "I sense great frustration, sir."  "No shit, Sherlock." ]

	Foom!  A 301-line refutation of my ANSI flame!  (This article isn't
much shorter.)  What it boiled down to was that Randy Meyers was concerned
that there is a great deal of misinformation floating around about Antsy-C,
and wanted to correct it.

In article <8806292138.AA22025@decwrl.dec.com> rmeyers@tle.dec.com (Randy Meyers 381-2743 ZKO2-3/N30) writes:
>>	You realize, of course, that this kind of optimization falls flat on
>>its face if I somehow manage to change the contents of the memory that
>>contains "abcdefg".  I could stuff a \0 where the 'd' is, and the program
>>would not notice.
>
>You are correct.  However, such a program is clearly poorly written.  [ ... ]

	I never said it was a *good* example.  I was illustrating that some
"clever" programmers may be able to find their way to that string constant
and beat on it, thus breaking the optimization.  I'd never try anything like
that, largely because it isn't clever enough :-).

>Kernighan and Ritchie never guaranteed the string constants were modifiable.

	Excuse me?  Someone else made the point that you can say:

	file = mktemp ("/tmp/ReXXXXX");

	and mktemp() will go in and bash on the string constant (since all
it knows about is the pointer it was passed).  I have been led to believe
that ANSI will break this.

> [ enormous dissertation on strlen(), part of which was: ]
>
>extern int strlen(const char *);	/* Required by ANSI */
>		   ^^^^^
	What in the name of Zarquon is a *Pascal* keyword doing in a C
program?  Does this declaration mean I can only pass string constants to
strlen()?  Can't I pass pointer variables anymore?

	Coming back to strlen(), I was trying (unsuccessfully, I gather) to
point out that, IMHO, a C compiler really has no business looking at
function calls (which are outside its domain), and trying to figure out what
the programmer *really* wanted to call.  If I call strlen(), dammit, I
expect strlen() to be called.  I will turn any and all such optimizations
off, since it's too easy for the compiler to foul it up.

>	#define DESC(d, string) (d.len = strlen(string), d.ptr = string)
>
>I actually had to use a similar macro recently.  Look at what happens
>when I make a call of the form DESC(d, "abcdefg").  [ ... ]
>For example:
>
>	register char *p;
>
>	p = "abcdefg";
>
>	/* 100,000 lines of code that don't modify p */
>
>	DESC(d, p);
>
>A reasonably good compiler will prove that p's value has not been changed
>since the initial assignment, and will transform the call into
>DESC(d, "abcdefg"). With Lattice's strlen optimization, this will boil
>down into two moves, instead of a function call and two moves.
>
	Urp.  I believe the optimization is invalid in this case.  You have
declared a pointer to a string constant, and are passing the variable to the
macro.  The best the compiler can do is detect that p has not changed value,
and drop a constant pointer value (rather than actually pulling it out of p)
into the strlen() argument, and *call strlen()*.

	If the compiler were to resolve it to 'DESC (d, "abcdefg")' it would
be an error, since that would compile a *different* copy of the string
constant into memory, unrelated to the one p points to.  This is important,
since I may have made a copy of p somewhere, and used the copy to bash on
the constant, making the strlen() optimization invalid.

>>#pragma is of questionable value (largely because no one has adequately
>>explained to me what it *does*!).
>
>Every compiler is free to develop pragmas and use any syntax that
>they want after the word pragma.  [ ... ]

	Ah!  Okay, that was my problem.  I thought there was a syntax
required to follow it, like #define.  So anyone can do anything they want
with #pragma.  It could turn into a can of worms you know...

>	#if LATTICE
>	    ^^^^^^^
>	#pragma Delete(R0,R1)  /* Means delete source file to MANX */
>	#endif
>
	I thought ANSI disallowed pre-defined constants or symbols...

>>Enforced parenthetical grouping whether or not it's necessary is Stupid.
>
>Expression control is necessary, but I don't like enforced parentheses
>either.  I preferred it when the new unary plus operator controlled
>				  ^^^^^^^^^^^^^^
>expression evaluation.  [ ... ]

	Unary plus???!!?  What's *that* supposed to be for, and how can it
alter expression evaluation?

	My problem with enforced parentheses is that constructs such as:

#define	FOO(x)		((x) + 15)

	thing = FOO (i + 4);

	...Which compiles to...

	thing = ((i + 4) + 15);

	...Which means that the compiler first adds four to i, then adds 15
to it.  Wouldn't it be nicer if it just added 19?

>>Breaking all the string functions and giving them cryptic names is Stupid.
>
>I agree totally.  But, I don't think that has happened.  The traditional
>functions with traditional meanings are around.  [ ... ]

	Like index(), and rindex(), and strcpy(), and strcat(), and
strtok().....?

	It's funny.  When I was briefly exposed to System V UNIX, I noticed
that all the names of the string functions were "wrong".  Along comes ANSI,
and I find that the string functions have names remarkably similar to the
broken SYS-V names.

	In an attempt to sum up, these are the problems I have with Antsy-C.
They are mostly based on second-hand information from a knowledgeable person
who has his ear to comp.lang.c, and who has been using C since way before it
became a chic language:

	o ANSI did not implement binary constants, claiming 'no prior art'.
	o ANSI held up the finalization of AN-C by flirting with a keyword
	  named 'noalias', which bought you nothing if you weren't running
	  on a supercomputer.
	o Dennis M. Ritchie is not on the standards committee, and is also
	  strangely silent on comp.lang.c with regard to opinions on what
	  ANSI has done to his language.
	o ANSI has been focusing their efforts on making the job of the
	  compiler writer easier, since the committee is populated largely
	  by purveyors of compilers; and has done little or nothing to
	  enhance C's usefulness or versatility for the programmer.
	o ANSI-C looks a hell of a lot like Pascal.  You seem to have to
	  talk a lot before you can really say anything.
	o ANSI broke the names of a lot of well-known library functions and
	  include files.
	o ANSI hasn't really resolved a lot of the irritating questions
	  revolving around C, like what NULL *really* is, or how big a
	  'short' or a 'long' *really* are.
	o The attitude appears to be, "If you had anything truly valuable to
	  say, we would have put you on the committee."
	o ANSI acts like 'void' and 'enum' are a new thing.

	Based on what I've heard from a number of individuals, ANSI has a
*lot* of crocks in it that really don't need to be there.  My aformentioned
friend made the observation, "If I can't take a valid K&R program that
passes 'lint' and run it through a dumb filter program and get an ANSI
program out the other side, I'll consider ANSI to be a complete and utter
loss."

	I'm prepared to sit back and watch to see what happens.  In the
meantime, I'm keeping my K&R compiler, since it's worked great so far, and I
like to think I've written some pretty decent code with it.

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
 \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
"Hmm, you're right.  Air is made up of suspended meat loaf."  -- Josh Siegel

peter@sugar.UUCP (Peter da Silva) (07/03/88)

Yes, Leo, ANSI contains much that is dain bramaged. But it's not that bad.
My person bitch is with compiler writers that implement part of the draft
and surprise you.

In article <6427@well.UUCP>, ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
> program?  Does this declaration mean I can only pass string constants to
> strlen()?  Can't I pass pointer variables anymore?

No, it means that strlen() guarantees it won't modify its argument.

> 	Coming back to strlen(), I was trying (unsuccessfully, I gather) to
> point out that, IMHO, a C compiler really has no business looking at
> function calls (which are outside its domain), and trying to figure out what
> the programmer *really* wanted to call.  If I call strlen(), dammit, I
> expect strlen() to be called.  I will turn any and all such optimizations
> off, since it's too easy for the compiler to foul it up.

Fine, look in your hypothetical ANSI include file. You'll find something like
this:

#define strlen BUILTIN_STRLEN

And it's BUILTIN_STRLEN that the compiler will turn into an inline. Actually,
I think the "correct" name of this function is "stclen()" anyway, since it
returns a count.

You don't like this, just #undef strlen.

> 	If the compiler were to resolve it to 'DESC (d, "abcdefg")' it would
> be an error, since that would compile a *different* copy of the string
> constant into memory, unrelated to the one p points to.

No, since ANSI allows the compiler to allocate one copy only of each string and
cons up multiple references to it. This, by the way, definitely would break
the cntrl() macro I quoted a little way back. I'm not sure I like this feature,
myself.

> 	Unary plus???!!?  What's *that* supposed to be for, and how can it
> alter expression evaluation?

The idea was that you'd say "+(expr)" and it would treat that expression as
atomic.

> 	My problem with enforced parentheses is that constructs such as:

> #define	FOO(x)		((x) + 15)

> 	thing = FOO (i + 4);

> 	...Which compiles to...

> 	thing = ((i + 4) + 15);

> 	...Which means that the compiler first adds four to i, then adds 15
> to it.  Wouldn't it be nicer if it just added 19?

It will. You are allowed to make optimisations that do not alter the value
of the expression. About the only place this actually makes a difference is
when you're dealing with floating point numbers, since floating point
arithmetic is not always associative.

> 	Like index(), and rindex(), and strcpy(), and strcat(), and
> strtok().....?

All but the good old index/rindex. I find it ironic that the ursurper
strchr got ursurped in turn with stpchr.

> My aformentioned
> friend made the observation, "If I can't take a valid K&R program that
> passes 'lint' and run it through a dumb filter program and get an ANSI
> program out the other side, I'll consider ANSI to be a complete and utter
> loss."

I believe that an ANSI compiler will accept a valid K&R program, no filter
needed. But I only have the '85 version of the draft, so don't bet money on
it.
> 
> 	I'm prepared to sit back and watch to see what happens.  In the
> meantime, I'm keeping my K&R compiler, since it's worked great so far, and I
> like to think I've written some pretty decent code with it.
> 
> _-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
> Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
>  \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
> O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
> "Hmm, you're right.  Air is made up of suspended meat loaf."  -- Josh Siegel


-- 
-- `-_-' Peter (have you hugged your wolf today?) da Silva.
--   U   Mail to ...!uunet!sugar!peter, flames to /dev/null.
-- "Running DOS on a '386 is like driving an Indy car to the Stop-N-Go"

jesup@cbmvax.UUCP (Randell Jesup) (07/04/88)

In article <6427@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
>[ "I sense great frustration, sir."  "No shit, Sherlock." ]
>
>	Foom!  A 301-line refutation of my ANSI flame!  (This article isn't
>much shorter.)  What it boiled down to was that Randy Meyers was concerned
>that there is a great deal of misinformation floating around about Antsy-C,
>and wanted to correct it.

	And he's right.  There are some real silly things in ANSI C, but the
ones you're pointing out aren't them.  I pity people in hearing range when you
find out about trigraphs. :-)  It could have been worse, noalias bit the
big one finally (what, you don't know about noalias?  The new untested keyword
that required 4 pages to explain, and even the explanation wasn't interpreted
the same way by any two compiler writers, let alone users?  The thin that made
DMR (of K&R fame) say "Noalias must go.  That is non-negotiable."?)  :-)

	There might even be features you like, such as prototypes, or maybe
the volatile keyword (so you can access hardware safely).

>>You are correct.  However, such a program is clearly poorly written.  [ ... ]
>
>	I never said it was a *good* example.  I was illustrating that some
>"clever" programmers may be able to find their way to that string constant
>and beat on it, thus breaking the optimization.  I'd never try anything like
>that, largely because it isn't clever enough :-).

	It's also inherently non-portable.

>>Kernighan and Ritchie never guaranteed the string constants were modifiable.

>	Excuse me?  Someone else made the point that you can say:
>	file = mktemp ("/tmp/ReXXXXX");
>	and mktemp() will go in and bash on the string constant (since all
>it knows about is the pointer it was passed).  I have been led to believe
>that ANSI will break this.

	mktemp assumes you pass it a modifiable string.  Remember, ANSI doesn't
outlaw modifying string constants, just says the results are implementation-
defined.  That may be fine in a particular implementation, but it isn't
guaranteed to be portable, as a strictly-conforming program is.

>>extern int strlen(const char *);	/* Required by ANSI */
>>		   ^^^^^
>	What in the name of Zarquon is a *Pascal* keyword doing in a C
>program?  Does this declaration mean I can only pass string constants to
>strlen()?  Can't I pass pointer variables anymore?

	All that means is that strlen won't modify it.  It was worse:
for a while, strcpy was defined as:
	extern char *strcpy(noalias char *,noalias const char *);
Say that three times quickly. :-)

>	Coming back to strlen(), I was trying (unsuccessfully, I gather) to
>point out that, IMHO, a C compiler really has no business looking at
>function calls (which are outside its domain), and trying to figure out what
>the programmer *really* wanted to call.  If I call strlen(), dammit, I
>expect strlen() to be called.  I will turn any and all such optimizations
>off, since it's too easy for the compiler to foul it up.

	So turn them off.  It's really easy.  In fact, they don't even turn on
unless you include "string.h", which has the #defines in it.

>>	register char *p;
>>
>>	p = "abcdefg";
>>
>>	/* 100,000 lines of code that don't modify p */
>>
>>	DESC(d, p);

>	Urp.  I believe the optimization is invalid in this case.  You have
>declared a pointer to a string constant, and are passing the variable to the
>macro.  The best the compiler can do is detect that p has not changed value,
>and drop a constant pointer value (rather than actually pulling it out of p)
>into the strlen() argument, and *call strlen()*.

	You never "pass a variable to a macro".  It gets expanded before the
compiler sees it (effectively).  If the macro includes other macros, fine,
they get expanded too.

	strlen maybe is a bad example.  The real win here is with things like
strcpy and strcmp, where the inline code is MUCH faster than a function call.

>	If the compiler were to resolve it to 'DESC (d, "abcdefg")' it would
>be an error, since that would compile a *different* copy of the string
>constant into memory, unrelated to the one p points to.  This is important,
>since I may have made a copy of p somewhere, and used the copy to bash on
>the constant, making the strlen() optimization invalid.

	If you did, then the optimization IS invalid, and the compiler has to
assume it may be aliased.  But the example said that none of the intervening
lines used p.

	This was where noalias was to have come in.  If that had been
"char *noalias p = ..." (or was that "noalias char *p = ..."?), then you'd be
telling the compiler that there are no aliases for what p points to, so it
could assume nothing else except p references would bash the string.  I
think you can see why it got trashed (if you make an error, you get unexplained
wrong results from the program with no warnings; and the semantics are much
more complicated than I've said.)

>	Ah!  Okay, that was my problem.  I thought there was a syntax
>required to follow it, like #define.  So anyone can do anything they want
>with #pragma.  It could turn into a can of worms you know...

	Yeah, but you can strip all pragmas pretty easily.  It shouldn't
effect the correctness of the resulting program.  Pragmas are hints.

>>	#if LATTICE
>>	    ^^^^^^^
>>	#pragma Delete(R0,R1)  /* Means delete source file to MANX */
>>	#endif
>>
>	I thought ANSI disallowed pre-defined constants or symbols...

	LATTICE is defined in dos.h.

>>Expression control is necessary, but I don't like enforced parentheses
>>either.  I preferred it when the new unary plus operator controlled
>>				  ^^^^^^^^^^^^^^
>>expression evaluation.  [ ... ]
>
>	Unary plus???!!?  What's *that* supposed to be for, and how can it
>alter expression evaluation?

	To match unary minus.  Yeah, I know it sounds like a nop, and it is,
but for a while they hung forced evaluation order on it as well.  I think that
died the same time as noalias.

	I don't think the paren ordering of expressions is forced in all
cases.

>	thing = ((i + 4) + 15);
>
>	...Which means that the compiler first adds four to i, then adds 15
>to it.  Wouldn't it be nicer if it just added 19?

	I think that will remain the default for such cases.  Don't worry.
You have to try to get paren ordering.

>	It's funny.  When I was briefly exposed to System V UNIX, I noticed
>that all the names of the string functions were "wrong".  Along comes ANSI,
>and I find that the string functions have names remarkably similar to the
>broken SYS-V names.

	There are two camps out there:  Sys V, and berkeley.  ANSI went with
Sys V, and no one is suprised.

>	o ANSI did not implement binary constants, claiming 'no prior art'.

	Lots of neat ideas had that happen.  Implement them and maybe they'll
get in the first revision. (ANSI C92?)

>	o ANSI held up the finalization of AN-C by flirting with a keyword
>	  named 'noalias', which bought you nothing if you weren't running
>	  on a supercomputer.

	But it lost you things...

>	o Dennis M. Ritchie is not on the standards committee, and is also
>	  strangely silent on comp.lang.c with regard to opinions on what
>	  ANSI has done to his language.

	See above quote.

>	o ANSI has been focusing their efforts on making the job of the
>	  compiler writer easier, since the committee is populated largely
>	  by purveyors of compilers; and has done little or nothing to
>	  enhance C's usefulness or versatility for the programmer.

	volatile and noalias and ... make it easier on the compiler writer?
You should hear what John Toebes says about ANSI and the work to implement it
(which, BTW, takes second place to improving the compiler at Lattice, unless
it's something they really want, like prototypes.)

>	o ANSI-C looks a hell of a lot like Pascal.  You seem to have to
>	  talk a lot before you can really say anything.

	Only if you use all those keywords.  In most cases, just ignore them.

>	o ANSI broke the names of a lot of well-known library functions and
>	  include files.

	Well known on which systems?  That's the problem, every system/OS/
compiler has it's own set of "well-known" includes that don't match many (if
any) others.

>	o ANSI hasn't really resolved a lot of the irritating questions
>	  revolving around C, like what NULL *really* is, or how big a
>	  'short' or a 'long' *really* are.

	No, that's well known (except to a few souls who insist the world is
flat):  NULL is the constant 0, short is <= int which is <= long.  I think
there MAY be somehthing saying short must be at least 16 bits, and long 32,
but I'm really not sure of that at all.

>	o ANSI acts like 'void' and 'enum' are a new thing.

	Huh?  

>	Based on what I've heard from a number of individuals, ANSI has a
>*lot* of crocks in it that really don't need to be there.  My aformentioned
>friend made the observation, "If I can't take a valid K&R program that
>passes 'lint' and run it through a dumb filter program and get an ANSI
>program out the other side, I'll consider ANSI to be a complete and utter
>loss."

	K&R is ancient history.  Ever used a REAL K&R compatible compiler?
Complete with one name space for ALL structure members?

>	I'm prepared to sit back and watch to see what happens.  In the
>meantime, I'm keeping my K&R compiler, since it's worked great so far, and I
>like to think I've written some pretty decent code with it.

	I suspect you have an H&S compiler, not K&R.  And no one says everyone
MUST implement a full ANSI compiler:  Lattice has full ANSI on a back-burner,
and some of the ANSI stuff might be controlled by compiler switches.

	In conclusion:  ANSI has some good ideas (which lattice has already
stolen, and manx is working on stealing some), and some annoying/bad ideas.
I think overall there's more good stuff than bad.  (And a lot of the "bad"
is merely implementation defined, which means if you don't care about portable
code, you've got no problems (assuming the compiler does what you want with
such things).

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

peter@sugar.UUCP (Peter da Silva) (07/04/88)

In article <4179@cbmvax.UUCP>, jesup@cbmvax.UUCP (Randell Jesup) writes:
> 	K&R is ancient history.  Ever used a REAL K&R compatible compiler?
> Complete with one name space for ALL structure members?

(Raises hand) "I have. I have.". I really wish that more people would act
as if compilers still did this... there are some cases in the Amiga includes
where overloading of structure member names (like, Flags) gets confusing.
It's not often that I get sg_flags confused with st_mode, but do I use
Activation or IDCMPFlags here? (yeh, I know, bad example. *everyone* knows
what IDCMPFlags are for. But you get the idea, I hope).

Oh well, it could be worse. Anyone ever seen the Atari ST documentation? (no,
I'm not going to even bother with the IBM-PC's).
-- 
-- `-_-' Peter (have you hugged your wolf today?) da Silva.
--   U   Mail to ...!uunet!sugar!peter, flames to /dev/null.
-- "Running DOS on a '386 is like driving an Indy car to the Stop-N-Go"

vkr@osupyr.mast.ohio-state.edu (Vidhyanath K. Rao) (07/05/88)

In article <2244@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> My person bitch is with compiler writers that implement part of the draft
> and surprise you.
Can ANSI copyright its name and prevent people from advertising 99.44%
compatibiltiy? I believe that this is done with TeX.
Shouldn't this discussion be moved to comp.languages.c or some such?

richard@gryphon.CTS.COM (Richard Sexton) (07/05/88)

In article <6427@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
>	o Dennis M. Ritchie is not on the standards committee, and is also
>	  strangely silent on comp.lang.c with regard to opinions on what
>	  ANSI has done to his language.

Well, dennis has actually emitted twice on the subject, to the best of my
recall. The posts wer VERY subtle, and hilarious. The first one
basically said: ``noalias must go. this is not negotiable''. It went.
The second one pointed out that the ANSI comittee had not made the language
easier to use or solve any of the things dennis perceived as problems. They
did what he did many years ago. Declare the rules and then bend them by
saying: ``these are the exceptions''.

>	o ANSI has been focusing their efforts on making the job of the
>	  compiler writer easier, since the committee is populated largely
>	  by purveyors of compilers;

So ? This is what Intel did for the 80x86. and look what a rousing sucess 
it is as a processor.


-- 
      If you were to flatten out Wales, it would be bigger than England.
richard@gryphon.CTS.COM                               {backbone}!gryphon!richard

peter@sugar.UUCP (Peter da Silva) (07/06/88)

In article <653@osupyr.mast.ohio-state.edu>, vkr@osupyr.mast.ohio-state.edu (Vidhyanath K. Rao) writes:
> In article <2244@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> > My personal bitch is with compiler writers that implement part of the draft
> > and surprise you.

> Can ANSI copyright its name and prevent people from advertising 99.44%
> compatibiltiy? I believe that this is done with TeX.
> Shouldn't this discussion be moved to comp.languages.c or some such?

Yeh, probably.

Thing is, said compiler writer didn't advertise ANSI compatibility. They just
put the features in as part of what I surmise is a gradual improvement effort.
It could be they didn't even get them from ANSI. Oh well...

It's a bummer when they put in structure passing without function prototyping,
and you accidentally screw up and forget an & in passing a pointer to a
structure. If you had function prototyping it'd say "hey, bonehead, you said
the function took a pointer". If you didn't have structure passing it'd say
"hey, bonehead, you can't pass a structure to a function". Instead it just
happily takes the first couple of elements of the structure as a pointer and
scribbles on memory. Oh well, I only made that particular mistake half a
dozen times. I don't do it much any more.
-- 
-- `-_-' Peter (have you hugged your wolf today?) da Silva.
--   U   Mail to ...!uunet!sugar!peter, flames to /dev/null.
-- "Running DOS on a '386 is like driving an Indy car to the Stop-N-Go"

merlyn@rose3.rosemount.com (Brian Westley) (07/08/88)

[ edited for telecommunications ]
Leo complains about ANSI C:
>[my complaints] are mostly based on second-hand information from a
>knowledgeable person who has his ear to comp.lang.c, and who has been
>using C since way before it became a chic language:

Leo, most of your arguments are full of half-truths, or misinformation.

I suggest you learn more about the ANSI C proposal firsthand.
Or read comp.lang.c (you would have seen Dennis's reply to the latest
proposal, the demise of noalias, how strlen("foo") can be optimized to 3,
how to tell C to use your strlen() & not the library's, how ((x+3)+5)
can be optimized to (x+8) even though parentheses are respected (using the
as-if rule, if your compiler ignores integer overflow), and other goodies)

If you simply want to vent your spleen, followups to alt.flame.  If you have
real ANSI C questions, use comp.lang.c or (better) comp.std.c

Merlyn LeRoy

bts@sas.UUCP (Brian T. Schellenberger) (07/20/88)

[tried EMail; it bounced.]

Structure passing / returning / assigning and enums predate ANSI by many,
many years (I have a copy of a paper descibing those features dated
November 15, 1978), so it isn't fair to criticize compiler-writers for 
introducing these ten-year-old features without the ANSI (less than
five-year-old) prototypes.
-- 
--Brian,                     __________________________________________________
  the man from              |Brian T. Schellenberger   ...!mcnc!rti!sas!bts
  Babble-On                 |104 Willoughby Lane     work: (919) 467-8000 x7783
____________________________|Cary, NC   27513        home: (919) 469-9389