[comp.lang.c++] references to dereferenced null pointers, etc...

pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (03/14/90)

In article <1623@argus.UUCP> ken@argus.UUCP (Kenneth Ng) writes:
   In article <1990Mar12.175613.12082@utzoo.uucp>, henry@utzoo.uucp
   (Henry Spencer) writes:

   : >once thought of doing an implementation where NULL != 0, but after further
   : >reading convinced myself the implementation would be invalid.
   : Not invalid, no; I think there are one or two such.

If I remember correctly, either the Pr1me or Data General C did
represent the null pointer with a bit pattern that was not all 0s.

   I'm confused, is a non zero NULL pointer valid or not?

The internal representation of a null pointer (not NULL -- 'NULL' is just
a popular macro) can be anything at all, as long as it does not coincide
with the representation of any valid pointer.

I remember however that some past edition of the X/open guide mandated
the use the NULL macro and not the constant 0, precisely to make
supporting non binary 0s null pointers easier. Bah. The NULL macro
can well be defined as something else then constant 0, e.g. the address
of an entity that is guaranteed not used elsewhere.

This is another of those K&R points that have been vastly ignored by
almost everybody; contrarily to some Holy Commandament, a lot of (UCB)
people assumed the world was a VAX. The null pointer issue has been
debated many times in comp.lang.c; I think it is in the frequently asked
questions document, and somebody (Spencer or Torek) have a writeup that
discusses it to death (e.g. the fallacy of using 0L as null pointer on
machines where 'sizeof (int) < sizeof (char *)').

What about C++? Well, there are some nice ways to have portable
null pointers without using the constant 0, that IMNHO is not as
descriptive as it should be. The first is to use

	#define nil(TYPE)	((TYPE) 0)

and the second is to use (Algol 68 like)

	extern const char	empty;		/* Definition elsewhere	*/
	static const char	*nil = &empty;

and the third is

	static const char	*nil = 0;

As to the latter, it assumes that the null pointer of type 'char *'
converts to the null pointer of type 'anytype *' in the appropriate
context. Well, let's hope :-). Note that is the second and third
example we are using 'char *' because we are guaranteed that any pointer
cast to 'char *' and back to its original type will be unchanged. This
may be inefficient on some machines (e.g. Pr1me) for which 'char *'
has a particular representation.

Let me mention another two subtle Classic C points, that have received
less attention and whose definition is different in C++ 2.0:

1) In Classic C 'sizeof (char)' was not necessarily 1. In particular, on
some machines and some implementations (PDP-10 C?) 'sizeof' units were
bits. This is of course very wise for certain machines that have 36 bit
words and 7 bit bytes. The world is not a VAX. Now we have it guaranteed
that 'sizeof' units are characters, and I would have preferred a uniform
guarantee about bits instead (yeah, I know this would limit the maximum
size of an object to a fraction of address space on many machines).

2) In Classic C bitfields could be either int or unsigned at the whim of
the implementation, and any specification to the contrary could be
ignored. In C++ we are now guaranteed that the type of a bit field is
either unsigned or int precisely as we declare it. This I like, because
the Classic C rule was very unorthogonal, and made bitfields essentially
useless without some regrettable limitations and contortions in their
use.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jeffa@hpmwtd.HP.COM (Jeff Aguilera) (03/16/90)

The following pointer nightmares really bother me.

I am a fairly defensive programmer.  So when I could not determine 
whether strlen() was portably defined for a null argument, I rolled 
my own, just to thwart unwanted surprises:

    ptrdiff_t myStrlen(const char* p)
    {
        register const char* q = p;
        if (q) while (*q) ++q;
        return q-p;
    }

Alas! This is not portable.  From Harbision and Steele, "C: A
Reference Manual", 2nd ed, p. 166:

    "Given two pointers p and q of the same type, the 
    difference p-q is an integer k such that adding k
    to q yields p.  The type of the difference may be
    either int or long, depending upon the implementation.
    The result is well defined and portable only if the
    two pointers point to objects in the same array,
    or at least are aligned as if they did.  If either
    of the pointers is null, the result is undefined."

Undefined?  Even if both are null?  Does this statement really
mean that

    char* p = 0;
    if (p-p != 0) ;

can dump core?  I bet ANSI is to blame, for their pointer paranoia.
Here the solution is simple:

    ptrdiff_t myStrlen(const char* p)
    {
        register const char* q = p;
        if (q) while (*q) ++q; else return 0;
        return q-p;
    }

Although it has `too many notes.'

The other nightmare is implementing an efficient and portable
memmove, or, equivalently, whether a given pointer aliases a
member of a known array.

Suppose char* p points to a malloc'ed arena of length N.  Thus,
p[0] through p[N-1] are valid lvalues, and generating p[N] is 
guaranteed not to dump core, as long as it is not accessed.  
How can we portably determine whether there exists a k, 0<=k<N, 
such that p+k==q?  Mathematically, we are groping for the 
following algorithm:

    int alias(const char* q, const char* p, int N)
    {
        int k=q-p;
        return 0<=k && k<N;
    }

But this won't even work on DOS, since the pointer difference 
ignores the segment portion of an address.  So how about

    int alias(const char* q, const char* p, int N)
    {
        int k=q-p;
        return p+k==q;
    }

Right?  No.  The only portable solution that I can think of
is O(N) time complexity:

    int alias(const char* q, const char* p, int N)
    {
        for (register int k=0; k<N; ++k) if (p+k==q) return 1;
        return 0;
    }

I want to vomit.  
-----
jeffa

gary@dgcad.SV.DG.COM (Gary Bridgewater) (03/16/90)

In article <PCG.90Mar14085521@rupert.cs.aber.ac.uk> pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) writes:
>If I remember correctly, either the Pr1me or Data General C did
>represent the null pointer with a bit pattern that was not all 0s.

The DG MV series has ring structured memory addresses (for all you MULTICS fans)
in which the second, third and fourth bits of a (16 bit) word address or the
first three bits of a byte address encoded the ring number (user ring 4-7).
So the address of the zeroth word in the user default ring (7) would be
0x70000000 and the address of the zeroth byte would be 0xE0000000.

    HOWEVER, NULL is still defined as 0.

Note that this has the (optional adjective) side effect of causing dereferences
of NULL addresses to trap.  Amazing how much existing (running?) code would
core dump when ported to an MV for just this reason. (This is NOT a VAX.)

Please don't remember us as a company with funny NULLs.
(We did port C++ to it - not to completly ignore the group subject.)
-- 
Gary Bridgewater, Data General Corporation, Sunnyvale California
gary@sv.dg.com or {amdahl,aeras,amdcad}!dgcad!gary
The impossible we understand right away - the obvious takes a little longer.

bart@videovax.tv.tek.com (Bart Massey) (03/17/90)

In article <1520025@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM (Jeff Aguilera) writes:
> 
> The following pointer nightmares really bother me.
> 
> I am a fairly defensive programmer.  So when I could not determine 
> whether strlen() was portably defined for a null argument, I rolled 
> my own, just to thwart unwanted surprises:
> 
>     ptrdiff_t myStrlen(const char* p)
>     {
>         register const char* q = p;
>         if (q) while (*q) ++q;
>         return q-p;
>     }
> 
> Alas! This is not portable.

It's not terribly useful, either.  There's really no reason to write your
own implementation of strlen() for this, since the system strlen() may well
be written in assembly and blindingly faster than yours.  Instead, why don't
you try

	/* warning: evaluates S twice */
	#define myStrlen(S)  ((S) ? strlen((S)) : 0)

or if you really want to be able to write myStrlen(s++)

	ptrdiff_t myStrlen(const char *p)
	{
	    if( !p )
	        return 0;
	    return strlen( p );
	}

or even, if on your machine the function call to strlen() is ludicrously
expensive, and you just can't stand it

	ptrdiff_t myStrlen(const char *p)
	{
	    register const char *q;
	   
	    if( !p )
	        return 0;
	    for(q = p; *q; q++)
	        /*do nothing*/;
	    return q-p;
	}

which really doesn't generate any more instructions than your nonportable
version in any case, and is *much more* efficient in the case that p == 0!

> The other nightmare is implementing an efficient and portable
> memmove, or, equivalently, whether a given pointer aliases a
> member of a known array.
> 
> Suppose char* p points to a malloc'ed arena of length N.  Thus,
> p[0] through p[N-1] are valid lvalues, and generating p[N] is 
> guaranteed not to dump core, as long as it is not accessed.  
> How can we portably determine whether there exists a k, 0<=k<N, 
> such that p+k==q?  Mathematically, we are groping for the 
> following algorithm:
> 
>     int alias(const char* q, const char* p, int N)
>     {
>         int k=q-p;
>         return 0<=k && k<N;
>     }
> ...
> Right?  No.  The only portable solution that I can think of
> is O(N) time complexity:
> 
>     int alias(const char* q, const char* p, int N)
>     {
>         for (register int k=0; k<N; ++k) if (p+k==q) return 1;
>         return 0;
>     }

I don't think so.  Even your O(N) solution is non-portable: if p and q are
in different "segments" on a segmented machine, they are allowed to compare
*equal* even if they point to different objects, since the result of the
comparison is undefined [ANSI C Draft 12-7-88 3.3.8,3.3.9]!  In short, ANSI
C restricts pointer comparisons to (a) comparisons between pointers to a
single "object", and (b) comparisons with the pointer constant 0, which
indicates an invalid location.

Thus, there is no way to write a legal portable "alias" function in ANSI C.
But this is just fine -- there was no way to write an even vaguely portable
general purpose alias function in K&R C, either!

Note that the results of such comparisons are "undefined" by the spec -- the
comparisons are not *illegal*.  Thus, if you are writing memmove() for a
traditional machine with a linear address space, you would probably write a
variant of your first function anyway, "knowing" that your compiler will do
the right thing in this case.  All you have proved is that memmove()
implementations are non-portable, which everybody already knew -- that's why
it's in the standard ANSI C library.

IMHO, the restrictions of the ANSI draft are a very nice compromise between
implementability of C on machines with bizarre pointers, and standard C pointer
semantics on traditional machines.

> I want to vomit.  

[Insert some obvious wisecracks here... :-) :-) :-)]

					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart

bart@videovax.tv.tek.com (Bart Massey) (03/23/90)

In private mail to me, someone writes:
 > In article <5763@videovax.tv.tek.com> you write:
 > >In article <1520025@hpmwjaa.HP.COM> jeffa@hpmwtd.HP.COM (Jeff Aguilera) writes:
 > >>The only portable solution that I can think of is O(N) time complexity:
 > >>         for (register int k=0; k<N; ++k) if (p+k==q) return 1;
 >
 > >I don't think so.  Even your O(N) solution is non-portable: if p and q are
 > >in different "segments" on a segmented machine, they are allowed to compare
 > >*equal* even if they point to different objects, since the result of the
 > >comparison is undefined [ANSI C Draft 12-7-88 3.3.8,3.3.9]!
 >
 > Read section 3.3.9 again.  The *relational* operators require the operands to
 > be in the same array, but the *equality* operators do not.  On a segmented
 > machine, the algorithm must work, even it has to do extra work to normalize
 > the pointers and generate a 32-bit compare.
 > 
 > "If two pointers ... compare equal, they ... both point to the same object."

You know, he's right -- I blatantly misread the ANSI draft.  My apologies to
Jeff, and to the rest of the net.  While I in no way endorse the
above-referenced solution (:-), it is correct...

					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart