[net.lang.c] C pet peeve

gnu (03/21/83)

Clearly if this person is writing programs with subscripts it's not
"really" written in C.  Real C programmers always use pointers, which
can't be checked without immense overhead or super intelligence in the
compilation environment.  (Since by definition a subscription is equivalent
to a pointer add then dereference, it's not clear to me that you could
check subscripting without checking pointers without violating the
language definition.)

I can see it now -- programs written with  *(p+i) in places where it
is known that subscript checking would fail, or in inner loops where speed
is important.  ("Daddy, why didn't he just write p[i]?"  "Obscure historical
reasons, Susie."  "But Daddy, the program stops working when I change it...")

	John Gilmore, Sun Microsystems

davy (03/23/83)

#R:linus:-1644100:pur-ee:15500014:000:1021
pur-ee!davy    Mar 22 10:06:00 1983


	While we're talking pet peeves, mine has always been the fact that
"return" (and "sizeof") do not require parentheses.  While I'm not saying
that they should require them, consider the following:

	main() 
	{
		int i;

		for (i=0; i < 5; i++)
			foo(i);
	}
	foo(n) 
	int n;
	{
		if (n == 3)
			return		/* <--- Note I forgot the ';' */
		printf("foo...%d\n", n);
	}

Now, the obvious intention of this program is to produce the output:

		foo...0
		foo...1
		foo...2
		foo...4

But, instead, it produces the output:

		foo...3

This is because (for those of you who can't figure it out) the "return", since
there is no semicolon, will return the "value" of the printf.  Thus, the 
compiler compiles foo() as:

		foo(n)
		int n;
		{
			if (n == 3) 
				return(printf("foo...%d\n", n));
		}

I wish the compiler would print something in this case (a return without a
semicolon followed immediately by a newline) like:

	warning: 'return' statement possibly misinterpreted

Anybody agree with me?

--Dave Curry
pur-ee!davy

leei (03/23/83)

	As long as pet peeves with C are the vogue, my personal pet peeve is
the fact that C's union construction necessarily introduces an extra level
of context.  I always wanted to be able to set up structures like:

struct foo {
    short type;
    union {
	int as_int;
	char *as_ptr;
    };
} bar;

and then be able to use it like:

	if ( bar.type == INTEGER )
	    printf("%d", bar.as_int);
	else
	    printf("%s", bar.as_ptr);

As it is now, I have to either name the union inside and specify the union
name in the path, or use #define's to avoid having to specify this spurious
node.

	Well, you say, too bad but what can we do?  It's actually pretty easy.
A friend of mine and I hacked Steve Johnson's PCC, which makes up pass 1 of the
UNIX C compiler (at least for 4.1, I'm not sure about the others).  We changed
the structure dereferencing so that it does a search down the structure tree
until it finds the element you asked for.  Admittedly, this is not the most
direct solution to the problem, since I would much rather be able to use an
overlay structure which simply overlays storage without adding another level
of context, but this was much easier to do (and somewhat more general).

	With our hack, we can do things like that seen below.

struct a {
    int s;
    union {
	int b;
	char c;
	char *s;
    } o;
} glumph;

and 'glumph.s' refers to the outer level integer
    'glumph.o.s' refers to the inner level char pointer
    'glumph.b' refers to the \inner/ level integer 
    'glumph.c' refers to the \inner/ level char

	We now have a copy of our new ccom in my bin, which we both use for
our useless little hacks.  It should be pointed out that changes like this 
are COMPLETELY NON-PORTABLE and are not really recommended, even when they
work.  What we will do is use this for development and then clean up after
ourselves afterwards.  The point is, I don't like the way C handles this and
something can be done about it on a local level.  If anyone is stupid enough,
I can mail them a diff on cgram.y, which is where we made the change.

	I'm really not too happy about this, it's just too much of a hack, and
I would much prefer to implement the 'overlay' construct, which would take the
form of a union with no context level change, but it would HAVE to be inside
a struct, and it appears that it would take quite a bit of substantial hacking
at cgram.y.  Good luck if you want to try it.

						Lee Iverson
						princeton!leei

mash (03/24/83)

As various folks have mentioned, it is difficult to check C subscripts.
In fact, it is worse than has been mentioned: there may well be only two
rational design points for languages ofthe C/PASCAL/FORTRAN/ALGOL... level:

1) (like C) use a language that models typical machines directly,
with little extra overhead, and fairly unconstrained semantics, i.e.,
we all know pointers are addresses, and expect no protection.
OR
2) Design a language to be compile-time checkable from day one,
with a) highly-constrained pointer semantics, b) either dope vectors/
descriptors for any objects (like arrays) passed by reference, or
array-size conformance required of functions (thus forbidding
variably-sized arguments).
In case 2, given an optimizing compiler that does serious dataflow
analysis (i.e., like IBM FORTRAN IV(H)), it is possible to optimize away
many of the otherwise necessary subscript checks.
However, much care is needed in design of language semantics or this becomes
excruciatingly difficult (excruciating because safety usually implies
numerous checks that are actually unecessary).  For example, in PL/I:
DCL X(10);		DCL	X(10);		DCL X(10);
DO I = 1 TO 10;		DO I = 1 TO 10;		CALL SUBR(I);
    X(I) = X(I)+1;	   CALL SUBR(I);	I = 1;
END;			   X(I) = X(I) + 1;	CALL SUBY;
			END;			X[I] = 1;
The left case needs no subscript checking; the 2nd case needs 1 subscript check
for the assignment statement, because SUBR may have modified I.  (It probably
didn't, but call-by-reference makes it very difficult to know what's
happening at the point of invocation -- here, C's default call-by-value
only is a great help: at least when you see funct(&x) you expect that x
might be changed.) Even worse, in the 3rd case, the X(I) above also needs
a check, because safety requires that you assume that once you give away the
address of anything (as in SUBR), that it may be saved somewhere and
the value modified in any subroutine call. Same issue arises in some FORTRANs.
Solutions to the problem for typical languages require complex inter-
procedural analysis, fancy linkers, or complex compilation/binding systems

What's the moral? this is not an argument against checking for
(subscript-in-range, undefined variables, pointer usage), but an observation
that doing checking well requires considerable language design thought,
or acceptance of considerable overhead in space and time.

I personally think that either a) stick with something whose semantics
is fairly straightforward, like C, or b) go to a much higher level where
subscript-checking mostly disappears into higher-level aggregate operations,
i.e., go to APL or SETL, etc.
-mashey

donchin (03/24/83)

#R:linus:-1644100:uiucdcs:27600016:000:459
uiucdcs!donchin    Mar 23 22:51:00 1983

My pet peeve is the EOL terminator for printf commands.  Sure I
understand that it is possible to end up printing whole sections of
your program, but the error will be obvious at run-time and being able
to write
	printf("
		This is a title for a list of variables
	%d	%d	%d	%d	%d	%d",
	a,b,c,d,e,f)
would be very convenient and, I think, much more legible
than
	printf ("\n\t\tThis is a title for a list of variables\n\
		%d\t%d\t%d\t%d\t%d\t%d",a,b,c,d,e,f)

tim (03/25/83)

About the C EOL terminator for strings:

It should not be possible to have strings overlap lines, as you
suggest. This would be hard to read, and has the potential to cause
some really frustrating errors. The real problem is that C has no
string operators. If you could use, say, the + operator to concatenate
strings, then no one would ever have to split printf format strings.

There is no reason for C not to have these. True, there are functions,
but infix operators are far more readable than prefix function calls
in most applications where you do a lot with strings (even for a
Lisp hacker like myself); also, the compiler cannot evaluate constant
string expressions at compile time if functions are used.

Tim Maroney

randals (03/27/83)

In a recent article from:

	Bill Lee
	lee@utexas-11
	...!ucbvax!nbires!ut-ngp!lee
	...!eagle!ut-ngp!lee

he indicates that the way to get unions without the additional
level of indirection is to do something like:

	#define as_int o.as_int1;
	#define as_ptr o.as_ptr1;

	struct foo {
		short type;
		union {
			int as_int1;
			char *as_ptr1;
		} o;
	} bar;

DON'T APPEND the semicolon to the definition!!  If you refer
to bar.as_int, you will get (expanded) "bar.o.as_int1;", which
will undoubtedly botch the compile (an extra semicolon).

Remember, #define's are *not* "C" code... they are simply
straight substitutions!  Make them look like this:

	#define as_int o.as_int1
	#define as_ptr o.as_ptr1

Randal L. ("sometimes below C-level") Schwartz
Tektronix Engineering Computing Systems (the UNIX folks)
Wilsonville, Oregon, USA

UUCP:	...!XXX!teklabs!tekecs!randals (ignore return address)
	(where XXX is one of: aat cbosg chico decvax harpo ihnss
	lbl-unix ogcvax pur-ee reed ssc-vax ucbvax zehntel)
CSNET:	tekecs!randals @ tektronix
ARPA:	tekecs!randals.tektronix @ rand-relay

lee (03/29/83)

Ho-hum. If anyone cares, my previous example about hiding structure
levels using defines was incorrect. I inadvertently added semi-colons
to the end of my defines. As everyone knows, defines are not ended by
semi-colons (except in unusual cases). Thanks to everyone that pointed
this out to me.

zrm (04/03/83)

The difficulty many people find in mastering C, especially without
assembly language experience, is a string argument for coding systems in
ore than one language. Thus the nitty gritty like malloc, or device
drivers could be written in C, and the actual algorithms that are the
basis of the program could be coded in, say, Modula 2.

This project management method requires careful planning, so that you
don't wind up recoding modules in C just because you really need some
bit twiddling hack and can't afford to make a call to get it done for
you.

There are other ways of improving C programming technology. This past
January I gave a winter term seminar on a design for doing Flavor-like
object oriented programming in C. With such an environment, the methods
that maintain arrays can be as protective as they wish about the bounds
of that array.

If anyone out there would like to establish correspondence with me about
advanced programming technology in C, please drop me a line either at

	genradbolton!ccc!zrm	or
	zrm@mit-mc

Cheers,
Zig

p500vax:pat (04/09/83)

My favorite is 

foo( x, y )
	char	*x,y;
{}

when y was meant to be a character pointer.