gnu (03/21/83)
Clearly if this person is writing programs with subscripts it's not "really" written in C. Real C programmers always use pointers, which can't be checked without immense overhead or super intelligence in the compilation environment. (Since by definition a subscription is equivalent to a pointer add then dereference, it's not clear to me that you could check subscripting without checking pointers without violating the language definition.) I can see it now -- programs written with *(p+i) in places where it is known that subscript checking would fail, or in inner loops where speed is important. ("Daddy, why didn't he just write p[i]?" "Obscure historical reasons, Susie." "But Daddy, the program stops working when I change it...") John Gilmore, Sun Microsystems
davy (03/23/83)
#R:linus:-1644100:pur-ee:15500014:000:1021 pur-ee!davy Mar 22 10:06:00 1983 While we're talking pet peeves, mine has always been the fact that "return" (and "sizeof") do not require parentheses. While I'm not saying that they should require them, consider the following: main() { int i; for (i=0; i < 5; i++) foo(i); } foo(n) int n; { if (n == 3) return /* <--- Note I forgot the ';' */ printf("foo...%d\n", n); } Now, the obvious intention of this program is to produce the output: foo...0 foo...1 foo...2 foo...4 But, instead, it produces the output: foo...3 This is because (for those of you who can't figure it out) the "return", since there is no semicolon, will return the "value" of the printf. Thus, the compiler compiles foo() as: foo(n) int n; { if (n == 3) return(printf("foo...%d\n", n)); } I wish the compiler would print something in this case (a return without a semicolon followed immediately by a newline) like: warning: 'return' statement possibly misinterpreted Anybody agree with me? --Dave Curry pur-ee!davy
leei (03/23/83)
As long as pet peeves with C are the vogue, my personal pet peeve is the fact that C's union construction necessarily introduces an extra level of context. I always wanted to be able to set up structures like: struct foo { short type; union { int as_int; char *as_ptr; }; } bar; and then be able to use it like: if ( bar.type == INTEGER ) printf("%d", bar.as_int); else printf("%s", bar.as_ptr); As it is now, I have to either name the union inside and specify the union name in the path, or use #define's to avoid having to specify this spurious node. Well, you say, too bad but what can we do? It's actually pretty easy. A friend of mine and I hacked Steve Johnson's PCC, which makes up pass 1 of the UNIX C compiler (at least for 4.1, I'm not sure about the others). We changed the structure dereferencing so that it does a search down the structure tree until it finds the element you asked for. Admittedly, this is not the most direct solution to the problem, since I would much rather be able to use an overlay structure which simply overlays storage without adding another level of context, but this was much easier to do (and somewhat more general). With our hack, we can do things like that seen below. struct a { int s; union { int b; char c; char *s; } o; } glumph; and 'glumph.s' refers to the outer level integer 'glumph.o.s' refers to the inner level char pointer 'glumph.b' refers to the \inner/ level integer 'glumph.c' refers to the \inner/ level char We now have a copy of our new ccom in my bin, which we both use for our useless little hacks. It should be pointed out that changes like this are COMPLETELY NON-PORTABLE and are not really recommended, even when they work. What we will do is use this for development and then clean up after ourselves afterwards. The point is, I don't like the way C handles this and something can be done about it on a local level. If anyone is stupid enough, I can mail them a diff on cgram.y, which is where we made the change. I'm really not too happy about this, it's just too much of a hack, and I would much prefer to implement the 'overlay' construct, which would take the form of a union with no context level change, but it would HAVE to be inside a struct, and it appears that it would take quite a bit of substantial hacking at cgram.y. Good luck if you want to try it. Lee Iverson princeton!leei
mash (03/24/83)
As various folks have mentioned, it is difficult to check C subscripts. In fact, it is worse than has been mentioned: there may well be only two rational design points for languages ofthe C/PASCAL/FORTRAN/ALGOL... level: 1) (like C) use a language that models typical machines directly, with little extra overhead, and fairly unconstrained semantics, i.e., we all know pointers are addresses, and expect no protection. OR 2) Design a language to be compile-time checkable from day one, with a) highly-constrained pointer semantics, b) either dope vectors/ descriptors for any objects (like arrays) passed by reference, or array-size conformance required of functions (thus forbidding variably-sized arguments). In case 2, given an optimizing compiler that does serious dataflow analysis (i.e., like IBM FORTRAN IV(H)), it is possible to optimize away many of the otherwise necessary subscript checks. However, much care is needed in design of language semantics or this becomes excruciatingly difficult (excruciating because safety usually implies numerous checks that are actually unecessary). For example, in PL/I: DCL X(10); DCL X(10); DCL X(10); DO I = 1 TO 10; DO I = 1 TO 10; CALL SUBR(I); X(I) = X(I)+1; CALL SUBR(I); I = 1; END; X(I) = X(I) + 1; CALL SUBY; END; X[I] = 1; The left case needs no subscript checking; the 2nd case needs 1 subscript check for the assignment statement, because SUBR may have modified I. (It probably didn't, but call-by-reference makes it very difficult to know what's happening at the point of invocation -- here, C's default call-by-value only is a great help: at least when you see funct(&x) you expect that x might be changed.) Even worse, in the 3rd case, the X(I) above also needs a check, because safety requires that you assume that once you give away the address of anything (as in SUBR), that it may be saved somewhere and the value modified in any subroutine call. Same issue arises in some FORTRANs. Solutions to the problem for typical languages require complex inter- procedural analysis, fancy linkers, or complex compilation/binding systems What's the moral? this is not an argument against checking for (subscript-in-range, undefined variables, pointer usage), but an observation that doing checking well requires considerable language design thought, or acceptance of considerable overhead in space and time. I personally think that either a) stick with something whose semantics is fairly straightforward, like C, or b) go to a much higher level where subscript-checking mostly disappears into higher-level aggregate operations, i.e., go to APL or SETL, etc. -mashey
donchin (03/24/83)
#R:linus:-1644100:uiucdcs:27600016:000:459 uiucdcs!donchin Mar 23 22:51:00 1983 My pet peeve is the EOL terminator for printf commands. Sure I understand that it is possible to end up printing whole sections of your program, but the error will be obvious at run-time and being able to write printf(" This is a title for a list of variables %d %d %d %d %d %d", a,b,c,d,e,f) would be very convenient and, I think, much more legible than printf ("\n\t\tThis is a title for a list of variables\n\ %d\t%d\t%d\t%d\t%d\t%d",a,b,c,d,e,f)
tim (03/25/83)
About the C EOL terminator for strings: It should not be possible to have strings overlap lines, as you suggest. This would be hard to read, and has the potential to cause some really frustrating errors. The real problem is that C has no string operators. If you could use, say, the + operator to concatenate strings, then no one would ever have to split printf format strings. There is no reason for C not to have these. True, there are functions, but infix operators are far more readable than prefix function calls in most applications where you do a lot with strings (even for a Lisp hacker like myself); also, the compiler cannot evaluate constant string expressions at compile time if functions are used. Tim Maroney
randals (03/27/83)
In a recent article from: Bill Lee lee@utexas-11 ...!ucbvax!nbires!ut-ngp!lee ...!eagle!ut-ngp!lee he indicates that the way to get unions without the additional level of indirection is to do something like: #define as_int o.as_int1; #define as_ptr o.as_ptr1; struct foo { short type; union { int as_int1; char *as_ptr1; } o; } bar; DON'T APPEND the semicolon to the definition!! If you refer to bar.as_int, you will get (expanded) "bar.o.as_int1;", which will undoubtedly botch the compile (an extra semicolon). Remember, #define's are *not* "C" code... they are simply straight substitutions! Make them look like this: #define as_int o.as_int1 #define as_ptr o.as_ptr1 Randal L. ("sometimes below C-level") Schwartz Tektronix Engineering Computing Systems (the UNIX folks) Wilsonville, Oregon, USA UUCP: ...!XXX!teklabs!tekecs!randals (ignore return address) (where XXX is one of: aat cbosg chico decvax harpo ihnss lbl-unix ogcvax pur-ee reed ssc-vax ucbvax zehntel) CSNET: tekecs!randals @ tektronix ARPA: tekecs!randals.tektronix @ rand-relay
lee (03/29/83)
Ho-hum. If anyone cares, my previous example about hiding structure levels using defines was incorrect. I inadvertently added semi-colons to the end of my defines. As everyone knows, defines are not ended by semi-colons (except in unusual cases). Thanks to everyone that pointed this out to me.
zrm (04/03/83)
The difficulty many people find in mastering C, especially without assembly language experience, is a string argument for coding systems in ore than one language. Thus the nitty gritty like malloc, or device drivers could be written in C, and the actual algorithms that are the basis of the program could be coded in, say, Modula 2. This project management method requires careful planning, so that you don't wind up recoding modules in C just because you really need some bit twiddling hack and can't afford to make a call to get it done for you. There are other ways of improving C programming technology. This past January I gave a winter term seminar on a design for doing Flavor-like object oriented programming in C. With such an environment, the methods that maintain arrays can be as protective as they wish about the bounds of that array. If anyone out there would like to establish correspondence with me about advanced programming technology in C, please drop me a line either at genradbolton!ccc!zrm or zrm@mit-mc Cheers, Zig
p500vax:pat (04/09/83)
My favorite is foo( x, y ) char *x,y; {} when y was meant to be a character pointer.