gnu@sun.uucp (John Gilmore) (03/19/84)
If we do ANYTHING about this, we should do it right. This means that we can't embed "zero byte means end" into the language -- a character "string" is simply an array of chars, and don't you forget it. In any case, we can't do array comparison, since an array is by definition converted to a pointer to its first element every time it's used -- there's no way to refer to the whole array except in sizeof. When I want to pass an array as an argument, I typically enclose it in a struct. The flexibility of pointer==array==pointer is a big win but it has some minor losses like this. Personally I think that avoiding structure comparison "because it's hard" is a copout. It's already "hard" for the C programmer who has to compare each member of the structure individually. For the compiler, 99% of the cases are without holes and it's easy and efficient. I don't propose comparing unions, or following pointers. The comparison s1 == s2 should be equivalent, except for storage reference pattern, to: ( (s1.memb1 == s2.memb1) && (s1.memb2 == s2.memb2) && ...)
tim@unc.UUCP (Tim Maroney) (03/20/84)
> If we do ANYTHING about this, we should do it right. This means > that we can't embed "zero byte means end" into the language -- a > character "string" is simply an array of chars, and don't you > forget it. In any case, we can't do array comparison, since an > array is by definition converted to a pointer to its first > element every time it's used -- there's no way to refer to the > whole array except in sizeof. When I want to pass an array as an > argument, I typically enclose it in a struct. The flexibility of > pointer==array==pointer is a big win but it has some minor losses > like this. From the C Reference Manual, Sec. 2.5: "The compiler places a null byte \0 at the end of each string so that programs which scan the string can find its end." Thus "zero byte means end" is already written into the language. It is true that right now, array-typed identifiers are automatically converted to pointers in expressions. However, the type of the identifier is still "array". (C Ref. Man, 14.3: "Every time an identifier of array type appears in an expression, it is converted into a pointer to the first member of the array.") However, I can see no strong objection to violating this for expressions of new operators that we add to the language, which was my first suggestion. It would be just as easy, but not backward compatible, to change the rules and use the existing operators, saying that arrays are not converted to pointers when they are the operands of the comparison operators; instead, the compiler will produce code to compare the arrays elementwise. This avoids adding new elementwise comparison operators, but it means that old code which compares array identifiers to something would have to be changed to include type casts to turn the array into a pointer. In either case, there are two possible ways that you could determine where the end of an array is. If the identifier is a fixed-size array, then you know how many elements there are at compile time, and you can just SOB or AOBLSS or whatever to do a fixed number of comparisons. If it is an array that was declared with a null constant expression, it can be assumed that the array is null-terminated. Thus to compare strings you would store their addresses in variables declared, for instance "char StringVar[]". To compare complex numbers, you would declare the complex variables like: float complexVar[2]; or, more cleanly, typedef float complex[2]; complex complexVar; and the compiler would generate the array comparison code when you compared two identifiers of type "complex". This is a side effect of providing array comparison that I hadn't realized at first; I should note that this only works like you want it to for equality comparison, not ordering. The idea of structure comparison also points to adding array comparison. It's certain that if you are doing a memberwise comparison of structures with array-typed fields, you don't want to compare the addresses of the arrays: they will always be different if the structs are not the same block of storage (which you should always check for first anyway if it could happen because it is so much faster when it's true). > I don't propose comparing unions, or following pointers. The > comparison s1 == s2 should be equivalent, except for storage > reference pattern, to: > > ( (s1.memb1 == s2.memb1) && (s1.memb2 == s2.memb2) && ...) I agree with you, although it might be nice to be able to specify that a pointer should be followed -- that comes perilously close to creeping featurism, though, and I have already listed some problems with it in my previous article, so I'll just shut up about it.... However, it is interesting to note that if the semantics of array comparison were changed as I suggested, then you would get the proper comparison of array-typed structure members. -- Tim Maroney, The Censored Hacker mcnc!unc!tim (USENET), tim.unc@csnet-relay (ARPA) All opinions expressed herein are completely my own, so don't go assuming that anyone else at UNC feels the same way.
rcd@opus.UUCP (03/21/84)
<> > Personally I think that avoiding structure comparison "because it's hard" > is a copout. It's already "hard" for the C programmer who has to compare > each member of the structure individually. For the compiler, 99% of the > cases are without holes and it's easy and efficient. > > I don't propose comparing unions, or following pointers. I'm sorry I ever slipped and understated the situation! When I said that comparing structures is "hard" I meant that it's hard enough to be impractical. Unions are perhaps the worst aspect, and you CAN'T avoid unions if you are going to compare structures, because a structure can contain a union. Filling in a little bit of the reasoning: The elements of a union can be of differing size. If you compare two instances of a struct-containing-union, you have to know how much of the union to compare, to avoid a spurious mismatch on the unused trailing part of a short field. To do that, you have to know which field of the union is the current one, which in turn would mean that code would be required at EVERY assignment to a field of the union (plus or minus some very messy optimization) to keep track of the current field. This isn't even remotely practical with C's undiscriminated union. [OK, at least 5% of you are ready to suggest allowing "comparisons of structures unless they contain unions". Please don't. First, this is pretty clearly a clunky special case we don't need - but it also makes a mess of a certain sort of data-hiding: struct junk { int a,b,c,d; mystery xyzzy; } where "mystery" is a typedef from elsewhere. The "junk" type shouldn't have to care what's in type "mystery"; it just needs one. (Data abstraction and all that apple pie...) However, the semantics of objects of type "struct junk" - in particular, whether they can be compared - would depend on whether "mystery" was, or contained, a union.] -- {hao,ucbvax,allegra}!nbires!rcd
rcd@opus.UUCP (03/21/84)
<> > In either case, there are two possible ways that you could determine where > the end of an array is. If the identifier is a fixed-size array, then you > know how many elements there are at compile time, and you can just SOB or > AOBLSS or whatever to do a fixed number of comparisons. If it is an array > that was declared with a null constant expression, it can be assumed that > the array is null-terminated... Two problems with this. First, what does "null terminated" mean? Fine for arrays of char, maybe even arrays of pointers if you stretch it. (AAACCK! I hear the ol' null-pointer portability discussion starting up again. Forget I said anything about pointers!) But what about arrays of int - what's a "null" for that type? Is this comparison going to be a special kluge for chars only? Second problem - you introduce a very funny nonuniformity of behavior: if the dimension is known, you get a comparison of a fixed number of elements; if unknown, a delimiter-controlled comparison. I can think of situations where changing a declaration or two could really make things go haywire. -- {hao,ucbvax,allegra}!nbires!rcd
neal@denelcor.UUCP (Neal Weidenhofer) (03/23/84)
************************************************************************** >You missed his point; to correctly compare two structures a byte-for-byte >comparsion should *not* be done. The bytes which are not part of structure >members, but which are just padding to put structure members on the right >boundary, *must* not be compared or two structures which "should" be equal >won't compare equal. Yes but it's no harder for the compiler than it is for the programmer except for unions (see more below). >Yes, the comparison would be nice, but it's not trivial to implement and >won't compile into a simple comparison. Also, < and > are flatly impossible; >what about > > struct complex { > double realpart; > double imagpart; > }; > >Any result of < and > would be misleading as the complex numbers aren't an >ordered field. Granted that '<' and '>' would be misleading in certain cases, I suggest lexicographic comparison. It would still be useful in many cases, for arrays as well as structures. If a programmer insisted on using it where it was meaningless (e.g., complex numbers) it would be no worse than many features "C" and several other languages have (GIGO). >I'm sorry I ever slipped and understated the situation! When I said that >comparing structures is "hard" I meant that it's hard enough to be >impractical. Unions are perhaps the worst aspect, and you CAN'T avoid >unions if you are going to compare structures, because a structure can >contain a union. Filling in a little bit of the reasoning: The elements >of a union can be of differing size. If you compare two instances of a >struct-containing-union, you have to know how much of the union to compare, >to avoid a spurious mismatch on the unused trailing part of a short field. >To do that, you have to know which field of the union is the current one, >which in turn would mean that code would be required at EVERY assignment to >a field of the union (plus or minus some very messy optimization) to keep >track of the current field. This isn't even remotely practical with C's >undiscriminated union. Another solution would be to clear the unused parts of the union wherever a store is done. In many cases (admittedly not all) this would be very cheap. >[OK, at least 5% of you are ready to suggest allowing "comparisons of >structures unless they contain unions". Please don't. First, this is >pretty clearly a clunky special case we don't need - but it also makes a >mess of a certain sort of data-hiding: > struct junk { > int a,b,c,d; > mystery xyzzy; > } >where "mystery" is a typedef from elsewhere. The "junk" type shouldn't >have to care what's in type "mystery"; it just needs one. (Data >abstraction and all that apple pie...) However, the semantics of objects >of type "struct junk" - in particular, whether they can be compared - would >depend on whether "mystery" was, or contained, a union.] Your point is well taken but I wonder if you realize that you've also made it IMPOSSIBLE for the programmer to compare two structures of type "junk". It's in just such cases that structure comparison would be the most useful. I agree that there are some very hard problems but I think we need to give it more thought and conversation before we dismiss it as impractical. The point I see is that structures NEED to be compared sometimes, is it better for the compiler to make it easy for the programmer or not? Except for the problem with unions, the compiler has as much information about the structure as the programmer does--often more. Regards, Neal Weidenhofer Denelcor, Inc. <hao|csu-cs|brl-bmd>!denelcor!neal
jas@druxy.UUCP (ShanklandJA) (03/23/84)
For goodness' sake! When you need to do structure or array comparison, how hard is it to write a little macro that does the job? Under most circumstances, I can't see it taking more than 5 minutes. (Yes, people sometimes use structures with dozens -- or hundreds of fields. But how often? How often do such structures need to be compared in their entirety? Is it worth hacking up the language to accomodate such cases? The advantages of the macro approach are: (1) you get to interpret "equality" any way you want: dictionary order for some fields, numerical equality for others, etc.; pointers must point to one and the same object to be equal, or pointers to equal-valued objects are considered equal (and of course, you get to decide what makes the pointed-to objects "equal"); (2) you stop creeping featurism in its tracks. One objection I foresee is that people will say that the compiler could generate more efficient comparison code directly, say by emitting a single block-compare instruction to compare several truly adjacent fields ("truly adjacent" means no gap for alignment), whereas if you compare each field separately in a macro, the compiler will most likely emit separate compare instructions for each field. But I suggest the solution to that is to use a compiler that will optimize such code sequences into a single block-compare wherever possible. Granted, I'm not at all sure that a C compiler that optimizes to that extent exists anywhere, on any machine; but just wait.... Keep C small! Jim Shankland ..!ihnp4!druxy!jas
hom@hocda.UUCP (H.MORRIS) (03/24/84)
Having a compiler that recognizes sequences like if(x.y.z.elt1 == t.y.z.elt1 && x.y.z.elt2 == t.y.z.elt2 ...) and generates a single block compare if appropriate instead of implementing struct compares *because we want to keep C simple* would seem to me to defeate the purpose.
gwyn@brl-vgr.ARPA (Doug Gwyn ) (03/25/84)
I think the real problem with struct comparison is that there is no "natural" order relation among structs of the same kind, except for exact equality. This is in contrast to real numbers where there is a natural order defined. (For an illustration, consider a "complex number" struct. Ask the nearest mathematician how to determine which of two complex numbers is the "smallest". If he gives you any algorithm at all, it will probably involve the moduli of the complex numbers. How is the compiler going to know all this?) I have the same objection to overloading of operators (as in Ada). Just because one can use the same symbol (e.g. "+") to combine two objects in each of several different classes by no means implies that it is the "same" operation in the different classes, or even a unique extension. (Example: a Boolean "+" could be a "union"; it could also be a "Boolean sum". Which is "correct"?)
tbray@mprvaxa.UUCP (Tim Bray) (03/25/84)
x <-- eat me This response is provoked by Tim Maroney's suggestion that built-in comparison of of arrays could be implemented in the case where the length is not known by assuming a null element terminator. ************WRONG************* The whole null-termination concept is a misfeature (yes, the costs outweigh the benefits) and its use should be avoided in built-in whatevers. Other submittors have pointed out the problems which make structure/array built-ins impractical to implement. There is, however, one way to sidestep these problems: Use a descriptor architecture. For strings, the descriptor encodes the address and length (that's right, no more explaining to people that you need 13 bytes to store a 12-character string - and you can now embed null bytes in strings!) You could also pass arrays around by descriptor and the receiving code would not need to know in advance the size, but could determine it unambiguously. However, existing UNIX(tm) software is so heavily married to the concept of null termination that it ain't gonna happen. And look, if wrestling with the UNIX string subroutines is the worst programming problem you have, stop complaining... you could be composing DCB's for IEBGENER. Tim Bray
donn@sdchema.UUCP (03/26/84)
This debate is over the wrong issues.
What I hear some people saying is, 'Wow! My machine has a whiz-bang
block compare instruction! If only my compiler could generate it, my
code would be SO much faster!' People on the one side of the fence
seem to think that the only way to get this into the compiler is to
make a built-in language construct like structure comparison, while
people on the other side are willing to live with a function call if it
means keeping the syntax of C safe from questionable additions.
How about meeting halfway? The Berkeley folks often use a
post-processing 'sed' script which takes nasty things like splx() and
ntohl() and produces in-line code. While I'm sure we all groan at the
grossness of this kind of intervention, why not take the grossness away
and let the compiler have access to a 'library' of in-line assembly
language functions?
A macro-like syntax for these functions could be developed so that
there would be a standard way of adding to the library when desired.
No C compiler would be required to implement the functions in-line;
thus no C syntax or semantics would change. Any compiler that didn't
implement in-line functions would simply cause programs to use the
standard C library.
Thus we might see pseudo-C definitions like:
/*
* strlen( s ) -- Return length of s. VAX inline code.
*/
inline int strlen( s ) {
register r
movl s,r
locc $0,$0x7fffffff,(r) /* Search for null char in s */
subl2 r,r0 /* Subtract s from addr of end of s */
}
(The intent of the 'register' declaration would be to ask the compiler
to make a register available, by using one that is free or by saving
one on the stack and restoring it later.)
This would make string manipulating 'functions' just as fast as the
proposed string operators, but the idea is more general and C's syntax
doesn't suffer. Structure comparisons would be done quickly by a
simple bcmp(), which might cause an in-line 'cmpc3' instruction to be
emitted on a VAX or a subroutine call on a PDP11. (Or a tight loop --
the arrangement is flexible enough that anything you could code
yourself in assembler, you could make available to the compiler to
improve your C code.) No smarts are needed in the compiler to spot
loops that could be compiled into exotic instructions.
I was hoping to hear that this sort of thing was possible with PCC 2
when I went to Steve (?) Johnson's talk at Uniforum, since there was
something in the abstract that looked like it might be it. Nothing
came up in the talk and things were too busy afterwards for me to get
to buttonhole him in the hall. Is anyone out there familiar enough
with PCC 2 to say whether in-line assembler functions are present?
Too much wishful thinking,
Donn Seeley UCSD Chemistry Dept. ucbvax!sdcsvax!sdchema!donn
hom@hocda.UUCP (H.MORRIS) (03/27/84)
One should not confuse the issues of implementing >, <, etc for structs with that of implementing simple equality (and !=). I categorically think that > and < should NOT be implemented for structures. But I think since we have assignment, it makes good sense to define == and != for structures.
gwyn@brl-vgr.ARPA (Doug Gwyn ) (03/27/84)
String descriptors and null-terminated strings are two competing approaches to string design. Each has strong and weak points. However, as a user of both I would hate to see the user forced to use descriptor-based strings in C. Terminated-arrays are much easier to deal with in C code in the majority of cases.
lwall@sdcrdcf.UUCP (Larry Wall) (03/27/84)
In article <2831@brl-vgr.ARPA> gwyn@brl-vgr.ARPA (Doug Gwyn ) writes: >... >I have the same objection to overloading of operators (as in Ada). >Just because one can use the same symbol (e.g. "+") to combine two >objects in each of several different classes by no means implies >that it is the "same" operation in the different classes, or even a >unique extension. (Example: a Boolean "+" could be a "union"; it >could also be a "Boolean sum". Which is "correct"?) Just what kind of an idealistic totalitarian are you? By this kind of argument we can rid the world of freedom forever. Just because a freedom can be abused doesn't mean we should do away with it. GIVE ME OVERLOADING, OR GIVE ME DEATH! Seriously, there are some very good arguments in favor of overloading. 1) Almost all languages do it surreptitiously already (e.g. what is the type of "+" in C?). 2) The human mind thrives in the presence of overloading and other context-dependent activity. Consult any dictionary for examples. 3) It helps keep the namespace uncluttered. No need to make up silly names just because you want to do the same (or similar) thing to a different type of object. "Martha, I want to can-open this can." 4) It facilitates "generic" abstractions. What a Boolean "+" might mean depends on where I get the definition from, now doesn't it? Anyone who uses an operation with semantics they don't know is asking for it anyway, regardless of whether the semantics come from a "package" or are built into the language. If this is indeed the case, why not release language designers from having to predict ahead of time all the (reasonable) uses that someone might want to put an operator to, and give some modicum of freedom to package designers instead. I assure you, if the package designers misuse the freedom, people will tend not to use those packages. Thank goodness Ada isn't "perfect". Just a little more perfectible. Larry Wall {allegra,burdvax,cbosgd,hplabs,ihnp4,sdcsvax}!sdcrdcf!lwall (I'm glad my organization doesn't make me put a stupid disclaimer down here, which no one ever reads anyway, for reasons obvious to all but a few bosses... After all, when's the last time you saw an ORGANIZATION have an opinion on the net?)
scw@cepu.UUCP (03/27/84)
Sounds like a *GASP* PL/I dope vector to me. -- Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology) uucp: { {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcsvax!bmcg}!cepu!scw ARPA: cepu!scw@ucla-locus location: N 34 06'37" W 118 25'43"
dwr@ccieng6.UUCP ( Donald Wallace Rouse II) (03/29/84)
The version of C that I use allows compile-time initialization of the non-union elements of non-automatic structures, as: static struct s { int e1; char e2; union { int e31; double e32; } e3; } x = { 1, 'c' }; If C special-cases initialization in this way, there is no reason it can't special-case comparisons of structures containing unions. I agree that comparisons other than == != are probably not too useful. Don't worry about holes; let the compiler do it. If the compiler can keep track of holes during initialization, there is no reason that it can't keep track of them during comparison and generate appropriate code to skip them. On a slightly related topic, my favorite extension to C would be structure and array constants on the right-hand side of assignment expressions, as: int x () { struct s x; int i [5]; if (expr) x = { 4, 'x' }; else i = { 1, 2, 3, 4, 5 }; } D2
rcd@opus.UUCP (03/30/84)
<> Boy, this would-be feature doesn't want to die... >>Any result of < and > would be misleading as the complex numbers aren't an >>ordered field. > >Granted that '<' and '>' would be misleading in certain cases, I suggest >lexicographic comparison. It would still be useful in many cases, for >arrays as well as structures... Arrays might or might not be regarded as having an ordering property. Structures, both philosophically and in general usage, are almost always regarded as cartesian products of their components, hence inherently unordered. What's the point of defining ordering when the abstraction modeled doesn't have the property? Is it really so useful? >>(incredibly verbose explanation (mine) of why comparison is very costly >>if struct contains a union - see referenced article) >Another solution would be to clear the unused parts of the union wherever >a store is done. In many cases (admittedly not all) this would be very >cheap. This is just not so. It implies that any store into a field (or any subfield) of a union requires clearing the unused tail for that field, which can give you a 2:1 increase in the cost of a simple store for some very common cases. Moreover, I claim there's no implementation which is reasonable and completely correct, because you can generate pointers to the fields of the union (and this includes actual union-field parameters sent with &) and store into the union indirectly. (I'm beginning to wonder whether this is a contest to see if I tire of constructing counterexamples to proposals!) >>(example in which a struct type "junk" may or may not be comparable >>depending on whether it contains a union) >Your point is well taken but I wonder if you realize that you've also >made it IMPOSSIBLE for the programmer to compare two structures of type >"junk"... Yes, I realize that completely. I even like it. I like language rules that say things are either possible or impossible under all circumstances much better than I like rules that say that some things are possible sometimes. Special cases make the language harder to learn and use, and make compilers larger, slower, harder to write and harder to make correct. >I agree that there are some very hard problems but I think we need to give >it more thought and conversation before we dismiss it as impractical. The >point I see is that structures NEED to be compared sometimes, is it better >for the compiler to make it easy for the programmer or not?... And the point that I see is that, given the desire to achieve a not-very- large language, what should go in and what should stay out? Even more important, since the discussion started around changes to C, another question is whether a change such as structure comparison is useful enough (ignoring implementability for the moment) to justify changing the language. Remember that even for an upward-compatible change you must change existing compilers. -- {hao,ucbvax,allegra}!nbires!rcd
rcd@opus.UUCP (04/01/84)
<> > Except for > the problem with unions, the compiler has as much information about the > structure as the programmer does--often more. Actually, this isn't true either. If you think about situations in which you want to compare structures, it is probably true a fair amount of the time that the comparisons should be done in a particular order. That is, you will want the effect of if ((a.f1==b.f1) && (a.f2==b.f2) && ...) where the compiler would not necessarily choose to compare f1 before f2. The point is that the programmer may well have more information - he may be able to decide which fields mismatch more often and arrange to compare them first. [An aside: I would conjecture that the efficiency gains from having the compiler use special comparison instructions would probably be less overall than the efficiency gains from having the programmer specify the comparisons in optimal order.] -- {hao,ucbvax,allegra}!nbires!rcd