pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (03/14/90)
In article <1623@argus.UUCP> ken@argus.UUCP (Kenneth Ng) writes: In article <1990Mar12.175613.12082@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: : >once thought of doing an implementation where NULL != 0, but after further : >reading convinced myself the implementation would be invalid. : Not invalid, no; I think there are one or two such. If I remember correctly, either the Pr1me or Data General C did represent the null pointer with a bit pattern that was not all 0s. I'm confused, is a non zero NULL pointer valid or not? The internal representation of a null pointer (not NULL -- 'NULL' is just a popular macro) can be anything at all, as long as it does not coincide with the representation of any valid pointer. I remember however that some past edition of the X/open guide mandated the use the NULL macro and not the constant 0, precisely to make supporting non binary 0s null pointers easier. Bah. The NULL macro can well be defined as something else then constant 0, e.g. the address of an entity that is guaranteed not used elsewhere. This is another of those K&R points that have been vastly ignored by almost everybody; contrarily to some Holy Commandament, a lot of (UCB) people assumed the world was a VAX. The null pointer issue has been debated many times in comp.lang.c; I think it is in the frequently asked questions document, and somebody (Spencer or Torek) have a writeup that discusses it to death (e.g. the fallacy of using 0L as null pointer on machines where 'sizeof (int) < sizeof (char *)'). What about C++? Well, there are some nice ways to have portable null pointers without using the constant 0, that IMNHO is not as descriptive as it should be. The first is to use #define nil(TYPE) ((TYPE) 0) and the second is to use (Algol 68 like) extern const char empty; /* Definition elsewhere */ static const char *nil = ∅ and the third is static const char *nil = 0; As to the latter, it assumes that the null pointer of type 'char *' converts to the null pointer of type 'anytype *' in the appropriate context. Well, let's hope :-). Note that is the second and third example we are using 'char *' because we are guaranteed that any pointer cast to 'char *' and back to its original type will be unchanged. This may be inefficient on some machines (e.g. Pr1me) for which 'char *' has a particular representation. Let me mention another two subtle Classic C points, that have received less attention and whose definition is different in C++ 2.0: 1) In Classic C 'sizeof (char)' was not necessarily 1. In particular, on some machines and some implementations (PDP-10 C?) 'sizeof' units were bits. This is of course very wise for certain machines that have 36 bit words and 7 bit bytes. The world is not a VAX. Now we have it guaranteed that 'sizeof' units are characters, and I would have preferred a uniform guarantee about bits instead (yeah, I know this would limit the maximum size of an object to a fraction of address space on many machines). 2) In Classic C bitfields could be either int or unsigned at the whim of the implementation, and any specification to the contrary could be ignored. In C++ we are now guaranteed that the type of a bit field is either unsigned or int precisely as we declare it. This I like, because the Classic C rule was very unorthogonal, and made bitfields essentially useless without some regrettable limitations and contortions in their use. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jeffa@hpmwtd.HP.COM (Jeff Aguilera) (03/16/90)
The following pointer nightmares really bother me. I am a fairly defensive programmer. So when I could not determine whether strlen() was portably defined for a null argument, I rolled my own, just to thwart unwanted surprises: ptrdiff_t myStrlen(const char* p) { register const char* q = p; if (q) while (*q) ++q; return q-p; } Alas! This is not portable. From Harbision and Steele, "C: A Reference Manual", 2nd ed, p. 166: "Given two pointers p and q of the same type, the difference p-q is an integer k such that adding k to q yields p. The type of the difference may be either int or long, depending upon the implementation. The result is well defined and portable only if the two pointers point to objects in the same array, or at least are aligned as if they did. If either of the pointers is null, the result is undefined." Undefined? Even if both are null? Does this statement really mean that char* p = 0; if (p-p != 0) ; can dump core? I bet ANSI is to blame, for their pointer paranoia. Here the solution is simple: ptrdiff_t myStrlen(const char* p) { register const char* q = p; if (q) while (*q) ++q; else return 0; return q-p; } Although it has `too many notes.' The other nightmare is implementing an efficient and portable memmove, or, equivalently, whether a given pointer aliases a member of a known array. Suppose char* p points to a malloc'ed arena of length N. Thus, p[0] through p[N-1] are valid lvalues, and generating p[N] is guaranteed not to dump core, as long as it is not accessed. How can we portably determine whether there exists a k, 0<=k<N, such that p+k==q? Mathematically, we are groping for the following algorithm: int alias(const char* q, const char* p, int N) { int k=q-p; return 0<=k && k<N; } But this won't even work on DOS, since the pointer difference ignores the segment portion of an address. So how about int alias(const char* q, const char* p, int N) { int k=q-p; return p+k==q; } Right? No. The only portable solution that I can think of is O(N) time complexity: int alias(const char* q, const char* p, int N) { for (register int k=0; k<N; ++k) if (p+k==q) return 1; return 0; } I want to vomit. ----- jeffa
gary@dgcad.SV.DG.COM (Gary Bridgewater) (03/16/90)
In article <PCG.90Mar14085521@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes: >If I remember correctly, either the Pr1me or Data General C did >represent the null pointer with a bit pattern that was not all 0s. The DG MV series has ring structured memory addresses (for all you MULTICS fans) in which the second, third and fourth bits of a (16 bit) word address or the first three bits of a byte address encoded the ring number (user ring 4-7). So the address of the zeroth word in the user default ring (7) would be 0x70000000 and the address of the zeroth byte would be 0xE0000000. HOWEVER, NULL is still defined as 0. Note that this has the (optional adjective) side effect of causing dereferences of NULL addresses to trap. Amazing how much existing (running?) code would core dump when ported to an MV for just this reason. (This is NOT a VAX.) Please don't remember us as a company with funny NULLs. (We did port C++ to it - not to completly ignore the group subject.) -- Gary Bridgewater, Data General Corporation, Sunnyvale California gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary The impossible we understand right away - the obvious takes a little longer.
bart@videovax.tv.tek.com (Bart Massey) (03/17/90)
In article <1520025@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM (Jeff Aguilera) writes: > > The following pointer nightmares really bother me. > > I am a fairly defensive programmer. So when I could not determine > whether strlen() was portably defined for a null argument, I rolled > my own, just to thwart unwanted surprises: > > ptrdiff_t myStrlen(const char* p) > { > register const char* q = p; > if (q) while (*q) ++q; > return q-p; > } > > Alas! This is not portable. It's not terribly useful, either. There's really no reason to write your own implementation of strlen() for this, since the system strlen() may well be written in assembly and blindingly faster than yours. Instead, why don't you try /* warning: evaluates S twice */ #define myStrlen(S) ((S) ? strlen((S)) : 0) or if you really want to be able to write myStrlen(s++) ptrdiff_t myStrlen(const char *p) { if( !p ) return 0; return strlen( p ); } or even, if on your machine the function call to strlen() is ludicrously expensive, and you just can't stand it ptrdiff_t myStrlen(const char *p) { register const char *q; if( !p ) return 0; for(q = p; *q; q++) /*do nothing*/; return q-p; } which really doesn't generate any more instructions than your nonportable version in any case, and is *much more* efficient in the case that p == 0! > The other nightmare is implementing an efficient and portable > memmove, or, equivalently, whether a given pointer aliases a > member of a known array. > > Suppose char* p points to a malloc'ed arena of length N. Thus, > p[0] through p[N-1] are valid lvalues, and generating p[N] is > guaranteed not to dump core, as long as it is not accessed. > How can we portably determine whether there exists a k, 0<=k<N, > such that p+k==q? Mathematically, we are groping for the > following algorithm: > > int alias(const char* q, const char* p, int N) > { > int k=q-p; > return 0<=k && k<N; > } > ... > Right? No. The only portable solution that I can think of > is O(N) time complexity: > > int alias(const char* q, const char* p, int N) > { > for (register int k=0; k<N; ++k) if (p+k==q) return 1; > return 0; > } I don't think so. Even your O(N) solution is non-portable: if p and q are in different "segments" on a segmented machine, they are allowed to compare *equal* even if they point to different objects, since the result of the comparison is undefined [ANSI C Draft 12-7-88 3.3.8,3.3.9]! In short, ANSI C restricts pointer comparisons to (a) comparisons between pointers to a single "object", and (b) comparisons with the pointer constant 0, which indicates an invalid location. Thus, there is no way to write a legal portable "alias" function in ANSI C. But this is just fine -- there was no way to write an even vaguely portable general purpose alias function in K&R C, either! Note that the results of such comparisons are "undefined" by the spec -- the comparisons are not *illegal*. Thus, if you are writing memmove() for a traditional machine with a linear address space, you would probably write a variant of your first function anyway, "knowing" that your compiler will do the right thing in this case. All you have proved is that memmove() implementations are non-portable, which everybody already knew -- that's why it's in the standard ANSI C library. IMHO, the restrictions of the ANSI draft are a very nice compromise between implementability of C on machines with bizarre pointers, and standard C pointer semantics on traditional machines. > I want to vomit. [Insert some obvious wisecracks here... :-) :-) :-)] Bart Massey ..tektronix!videovax.tv.tek.com!bart ..tektronix!reed.bitnet!bart
bart@videovax.tv.tek.com (Bart Massey) (03/23/90)
In private mail to me, someone writes: > In article <5763@videovax.tv.tek.com> you write: > >In article <1520025@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM (Jeff Aguilera) writes: > >>The only portable solution that I can think of is O(N) time complexity: > >> for (register int k=0; k<N; ++k) if (p+k==q) return 1; > > >I don't think so. Even your O(N) solution is non-portable: if p and q are > >in different "segments" on a segmented machine, they are allowed to compare > >*equal* even if they point to different objects, since the result of the > >comparison is undefined [ANSI C Draft 12-7-88 3.3.8,3.3.9]! > > Read section 3.3.9 again. The *relational* operators require the operands to > be in the same array, but the *equality* operators do not. On a segmented > machine, the algorithm must work, even it has to do extra work to normalize > the pointers and generate a 32-bit compare. > > "If two pointers ... compare equal, they ... both point to the same object." You know, he's right -- I blatantly misread the ANSI draft. My apologies to Jeff, and to the rest of the net. While I in no way endorse the above-referenced solution (:-), it is correct... Bart Massey ..tektronix!videovax.tv.tek.com!bart ..tektronix!reed.bitnet!bart