[comp.lang.c] Pointer Comparison and Portability

nw@vaxine.UUCP (02/11/87)

Consider the following C function:

        same_char (p, q)
        char *p;
        char *q;
        {
                return (p == q);
        }

Does this function only return a non-zero value when p and q point
to the same physical character?  This may seem like a silly question,
but I haven't found an iron-clad answer in K & R yet.   I quote:

K & R page 98 (emphasis mine):

        If p and q point to members of the SAME ARRAY, then relations
        like <, >=, etc., work properly.

                p < q

        is true, for example, if p points to an earlier member of the array
        than does q.  The relations == and != also work.

It goes on to say that all bets are off for comparisons of pointers between
different arrays.  Thus, if p and q pointed inside different character arrays,
it would appear that comparison between those pointers, even for equality,
is undefined.

Obviously (?) the intent of the warning was that one should not assume
anything about the ordering of distinct objects in memory.  Unfortunately,
I don't see any guarantee that every object has a unique address as
defined by the pointer comparison operation.

Imagine a hypothetical segmented machine where every C object were
allocated in its own separate segment.  Essentially, my question
boils down to this:  On such a machine, must pointer comparison
be implemented by comparing both segment base and offset within
segment, or is it allowable to simply compare offset within segment?

I certainly wouldn't advocate such an implementation, but I'm disturbed that
I can't seem to rule it out.

Neil Webber     Automatix Inc (or current resident)     Billerica MA
                        {decvax,allegra}!encore!vaxine!nw

pinkas@mipos3.UUCP (02/13/87)

In article <416@vaxine.UUCP> nw@vaxine.UUCP (Neil Webber) writes:
>Consider the following C function:
>
>        same_char (p, q)
>        char *p;
>        char *q;
>        {
>                return (p == q);
>        }
>
>Does this function only return a non-zero value when p and q point
>to the same physical character?  This may seem like a silly question,
>but I haven't found an iron-clad answer in K & R yet.   I quote:
...
>It goes on to say that all bets are off for comparisons of pointers between
>different arrays.  Thus, if p and q pointed inside different character arrays,
>it would appear that comparison between those pointers, even for equality,
>is undefined.
...

True.  Consider the 80x86 achitecture and many of the existing C ompilers
for it.  Many of them try to optimize in memory models with large data
segments by using as small an offset as possible for the first element of
the array.  (This is a very small optimization, but it does save some
comparisons if the array is known to be smaller than a certain size.)  For
example:

	char p[16], q[16];

might let p = 0x1000:0000 and q = 0x1001:0000.  If your C compiler does not
do the necessary conversion to common segments, or a similar calculation,
the comparison will fail.  Many early C compiler for the 8086 treated
pointers as ints for comparison and long (segmented) pointers as longs.
Thus with the above example, the pointers would be considered to point to
objects 64k apart.  The proper calculation is (seg * sigsize + offset).  If
this calculation is used, the above arrays are 16 bytes apart (on the 8086
at least), and p[16] is identical (same location) as q[0].

Considering that K&R specically said that only pointers to the same array
should be compared, and the fact that many C compilers take K&R to be the
final word on everything, I would say that unless your compiler manual
states otherwise (or it works), you should avoid comparing pointers to
different arrays.

-Israel
-- 
User (n.): A programmer who will believe anything you tell him.

----------------------------------------------------------------------
UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!pinkas
ARPA:	pinkas%mipos3.intel.com@relay.cs.net
CSNET:	pinkas%mipos3.intel.com

tps@sdchem.UUCP (02/15/87)

In article <454@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
>In article <416@vaxine.UUCP> nw@vaxine.UUCP (Neil Webber) writes:
>>Consider the following C function:
>>
>>        same_char (p, q)
>>        char *p;
>>        char *q;
>>        {
>>                return (p == q);
>>        }
>>
>>Does this function only return a non-zero value when p and q point
>>to the same physical character?  This may seem like a silly question,
>>but I haven't found an iron-clad answer in K & R yet.   I quote:
>...
>True.  Consider the 80x86 achitecture and many of the existing C ompilers
>for it.  Many of them try to optimize in memory models with large data
>segments by using as small an offset as possible for the first element of
>the array.  (This is a very small optimization, but it does save some
>comparisons if the array is known to be smaller than a certain size.)  For
>example:
>
>	char p[16], q[16];
>...
>Considering that K&R specically said that only pointers to the same array
>should be compared, and the fact that many C compilers take K&R to be the
>final word on everything, I would say that unless your compiler manual
>states otherwise (or it works), you should avoid comparing pointers to
>different arrays.
>-Israel

I would say that a function call which compared its two pointer arguments
would *have* to work, no matter how K&R are interpreted on this point, or
else strcpy wouldn't work.

Proof:
	Using your declaration as a basis, declare
		
		char p[16];
		char q[] =	"stRing q";

		strcpy( p, q );

	now (assuming strcpy() is a real function call) look at
	strcpy():
		
		char *
		strcpy( s1, s2 )
			char	*s1, *s2;
	
	If (s1 == s2) could be true even if s1 and s2 pointed to different
	areas, then strcpy would copy s2 on top of itself.

That is, even on a segmented architecture, when pointers get passed to a
subroutine, they MUST have distinct addresses -- how else can the subroutine
know what area of memory to access?

"<" and ">" are still another matter.

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.UCSD

guy@gorodish.UUCP (02/16/87)

>In article <454@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
>>Considering that K&R specically said that only pointers to the same array
>>should be compared,

To be precise, they said

	Pointer comparison is portable only when the pointers point
	to objects in the same array.

when discussing "<", ">", "<=", and ">=".  Under "Equality
operators", they say that they're "exactly analogous" to relational
operators.  It is unclear whether this was intended to mean that
*all* pointer comparisons are portable only when the pointers point
to objects in the same array, or just comparisons other than for
equality or inequality.  I would vote for the latter, since pointer
*equality* can be defined as meaning "the two pointers point to the
same object.

In fact, ANSI X3J11 has already voted for the latter.

Under "Relational operators", they say that "If the objects pointed
to are not members of the same array, the result is undefined."
(Note, however that they also point out that "If P points to the last
member of an array object, the pointer expression P+1 is greater than
P, even though P+1 does not point to a member of the same array
object as P," which means that if P can be the last address in a
segment, you can't just naively compare offsets within a segment.)

Under "Equality operators", however, they say that "If two pointers
to objects or functions compare equal, they point to the same object
or function, respectively."

In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes:
>I would say that a function call which compared its two pointer arguments
>would *have* to work, no matter how K&R are interpreted on this point, or
>else strcpy wouldn't work.
>
>That is, even on a segmented architecture, when pointers get passed to a
>subroutine, they MUST have distinct addresses -- how else can the subroutine
>know what area of memory to access?

This isn't necessarily the case.  If one takes the first of the two
interpretations of K&R, two pointers that pointed to different
locations would compare equal.  This would be highly undesirable
(since people generally expect that two pointers will be equal iff
they point to the same object, which is presumably why ANSI ruled out
that interpretation), but would conform to the first interpretation
of K&R.

john@viper.UUCP (02/17/87)

In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes:
 >
 >That is, even on a segmented architecture, when pointers get passed to a
 >subroutine, they MUST have distinct addresses -- how else can the subroutine
 >know what area of memory to access?
 >

  Tom, on segmented arcitecture machines you -could- have two distinctly
different pointer values which point to the same memory space.  This is
what Israel Pinkas was (I think) trying to point out.

  In the example he gave:
	  char p[16], q[16];
some compilers would take this and generate array addresses that start
in two different memory segments (segments on 80x86 machines are 64k bytes
long, but the actual starting memory location for two consecutive segments
are only 16 bytes away from each other...)

  This means the address of &p[0] could be 0001:0000 (segment:offset) and the
address of &q[0] could be 0002:0000.  Now, given that a compiler -could-
allocate and address the two arrays in this manner, the address for the
memory location &p[17] (incorrect, but technicaly a legal memory reference)
would be 0001:0010.  The byte of memory addressed by q[0] and p[17] would
be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be
different.

  If the people writing the compiler take this into account, it will require
converting all pointers used in pointer comparisons to a normalized form
which maps the entire memory space in a linear (one value per memory byte)
fashion.  Unfortunately, this also will slow down (rather nastily) all
operations using pointer comparisons...  An undesireable side effect for
a "feature" not defined in K&R.


  Getting back to the original starting question asked by Neil Webber:

>>Consider the following C function:
>>
>>        same_char (p, q)
>>        char *p;
>>        char *q;
>>        {
>>                return (p == q);
>>        }
>>
>>Does this function only return a non-zro value when p and q point
>>to the same physical character?

  The answer to the exact wording of the question is YES...  However....
saying that the function will ONLY return a non-zero value when the pointers
match is not the same as "Does the function ALWAYS return a non-zero value
when p and q point to the same physical character?"  The answer to the latter
question is NO.

  You can have
two different pointer values which point to the same physical character.
This is unusual, but it does happen in any instance where you might run
into a compiler which allocates memory in the manner I mentioned above.
The "solution" that I've used is, in any program I write where this might
have an effect, I define a macro PRTEQUAL(x,y).  On most machines I can
define this to be:
	#define PTREQUAL(x,y)	(x == y)
on any machine I port this code to which gives me problems, I can redefine
PTREQUAL (or PTRLESS, PTRGREATER, etc) to reference a function which does
the math necessary for linear mapping of the memory space.
  
  As long as you are referencing addresses within the same structure/array
you will not have to worry about this and the "same_char()" function you
wrote will work all the time.

-------------
john@viper.UUCP <John L. Stanley>
Analyst/Consultant - DynaSoft Systems...

tps@sdchem.UUCP (02/19/87)

In article <541@viper.UUCP> john@viper.UUCP (John Stanley) writes:
>In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes:
> >
> >That is, even on a segmented architecture, when pointers get passed to a
> >subroutine, they MUST have distinct addresses -- how else can the subroutine
> >know what area of memory to access?
> >
>
>  Tom, on segmented arcitecture machines you -could- have two distinctly
>different pointer values which point to the same memory space...
>...
>  In the example he gave:
>	  char p[16], q[16];
>...
>  This means the address of &p[0] could be 0001:0000 (segment:offset) and the
>address of &q[0] could be 0002:0000.  Now, given that a compiler -could-
>allocate and address the two arrays in this manner, the address for the
>memory location &p[17] (incorrect, but technicaly a legal memory reference)
>would be 0001:0010.  The byte of memory addressed by q[0] and p[17] would
>be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be
>different.

&p[17] is illegal (or at least undefined) in C,
given the above definition.
If I compared this to any
other pointer value (even inside p[]) I
would not expect to get anything
well defined.

>  Getting back to the original starting question asked by Neil Webber:
>>>Consider the following C function:
>>>        same_char (p, q)
>>>        char *p;
>>>        char *q;
>>>        {
>>>                return (p == q);
>>>        }
>>>Does this function only return a non-zro value when p and q point
>>>to the same physical character?
>  The answer to the exact wording of the question is YES...  However....
>saying that the function will ONLY return a non-zero value when the pointers
>match is not the same as "Does the function ALWAYS return a non-zero value
>when p and q point to the same physical character?"  The answer to the latter
>question is NO.

Judging from your above
comments, I assume that what
you mean by this is that if
you pass &p[16] and &q[0] to
same_char(), they might refer
to the same physical memory
but have a different value.
Since there is no guarantee in
C that adjacent definitions
wind up adjacent in memory, I
would say the result of
same_char( &p[16], &q[0] )
would be undefined (by C).  It
would make no more sense to
argue about what same_char()
should return in this case
than if you had called
	
	char	*r =	p;
	same_char( r++, r++ )

So I would say that if you
pass *defined* pointer values
to same_char() it will return
1 if and only if they refer to
the same memory location, and
will return 0 otherwise.

|| Tom Stockfisch, UCSD Chemistry	tps%chem@sdcsvax.UCSD

Schauble@mit-multics.arpa (02/19/87)

Unfortunately, the Intel 8086 series provides another counter example.
On this machine, address are in the form of segment and offset.  The
Actual Address is 16*segment + offset.  This is usually written as
segment:offset.  Thus, the two pointers 0100:0010 and 0101:0000 point at
the same byte.  If compared, they will *NOT* compare equal.

So, if two pointers compare equal, they definately point at the same
object.  However, the converse is not true.  Two pointers that do not
compare equal do NOT (necessarily) point at different objects.

Seems like when K & R says that pointer comparison is undefined except
when the two are pointers to the same array, it should be taken to mean
exactly that for all operators, including == and !=.

          Paul
          Schauble at MIT-Multics.arpa

drw@cullvax.UUCP (02/19/87)

john@viper.UUCP (John Stanley) writes:
>   Tom, on segmented arcitecture machines you -could- have two distinctly
> different pointer values which point to the same memory space.  This is
> what Israel Pinkas was (I think) trying to point out.
> 
>   In the example he gave:
> 	  char p[16], q[16];
> (Details of 8086 memory addressing omitted.)
> The byte of memory addressed by q[0] and p[17] would
> be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be
> different.

Yes, but comparing &p[17] and &q[0] is comparing two addresses derived
from different allocations, and as the standard says, that is
implementation defined (i.e., may not work).  Note also that &p[17] is
a correct *pointer*, but attempting to fetch or store anything through
it is not.  &p[18] is not a correct pointer.

Dale
-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu

guy@gorodish.UUCP (02/19/87)

>Unfortunately, the Intel 8086 series provides another counter example.
>On this machine, address are in the form of segment and offset.  The
>Actual Address is 16*segment + offset.

Yes, but *if* you happen to construct two different long pointers that
point to the same address, that's just like double-mapping a location
with an MMU.

>So, if two pointers compare equal, they definately point at the same
>object.  However, the converse is not true.  Two pointers that do not
>compare equal do NOT (necessarily) point at different objects.

Yes, but this is an escape hatch for the benefit of e.g. systems
that have to do double mapping.  It is *not* intended to render
comparison of pointers for equality useless except when comparing
pointers to elements in the same array.

>Seems like when K & R says that pointer comparison is undefined except
>when the two are pointers to the same array, it should be taken to mean
>exactly that for all operators, including == and !=.

Yes, but the ANSI C standard explicitly separates pointer comparison
for (in)equality from relational comparison on pointers.  Anybody who
tried to sell *me* a C implementation where the same object had two
addresses *except* when something was explicitly doubly mapped, or in
similarly unusual cases (e.g. an 808[68] with no memory mapping,
where a segment, treated as a full 64KB segment, overlapped another
segment, but where the segment *really* isn't long enough to overlap
it) - i.e., an implementation where pointer equality wasn't
equivalent to object equality, except in some *very* specialized and
*explicitly*-documented cases - would be shown the door rather
quickly.

pes@bath63.UUCP (02/20/87)

Well, Multics itself provides the most dangerous of both worlds in that you
can get:

Two pointers of (to all determinable checks) *look* different but which in
fact point to the same thing (because the program has, through some cutesy
trick, managed to get the same 'physical' segment initiated twice, with 2
different 'logical' segment numbers.

Two pointers which (as near as can be determined) are identical, but which
the program 'thinks' (and which are 'supposed to') point at different things
(because the program has gotten a pointer to some 'object', and then managed
to terminate the 'physical' segment, and initiate a new segment, with the
same 'logical' segment number.

Admittedly, both of these effects require some fairly shady programming, but
unfortunately that's not unheard of.  They can also be achieved fairly
easily on large projects involving several programmers and a communication
breakdown.

john@viper.UUCP (02/22/87)

Both Tom Stockfisch and Dale Worley make a good point.  The method I used
to construct the problem pointers is certanly illegal in standard C.
  My response was primarily to the people talking about how the function
in question to compare two pointers could fail on a 80x86 arcitecture
machine.  I guess I should have made the orientation of my answer a bit
more clear.  Since some pointers used by programs in an MS-DOS environment 
are created by the operating system and not from C it is possible to have
illegally constructed pointers.
  The original question from Neil Webber asked if the function would
return TRUE if the two pointers pointed to the same "physical" character...
The side-track of 80x86 pointers carried this a bit beyond questions limited
only to "standard" C and I guess the failure cases will only occur when a
programmer goes a bit beyond what is legal, (and I'm sure -none- of us would
-ever- do anything in C that wasn't by-the-book...  ;-)
--- 
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john

franka@mmintl.UUCP (02/23/87)

In article <814@cullvax.UUCP> drw@cullvax.UUCP writes:
>john@viper.UUCP (John Stanley) writes:
>>   In the example he gave:
>> 	  char p[16], q[16];
>> (Details of 8086 memory addressing omitted.)
>> The byte of memory addressed by q[0] and p[17] would
>> be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would
>> be different.
>
>Yes, but comparing &p[17] and &q[0] is comparing two addresses derived
>from different allocations, and as the standard says, that is
>implementation defined (i.e., may not work).  Note also that &p[17] is
>a correct *pointer*, but attempting to fetch or store anything through
>it is not.  &p[18] is not a correct pointer.

Actually, of course, it is p[16] which is equal to q[0], and &p[17] which is
not a correct pointer.  Isn't zero-based array indexing wonderful? (:-)

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

faustus@ucbcad.UUCP (02/25/87)

In article <4537@brl-adm.ARPA>, Schauble@mit-multics.arpa (Paul Schauble) writes:
> Unfortunately, the Intel 8086 series provides another counter example.
> On this machine, address are in the form of segment and offset.  The
> Actual Address is 16*segment + offset.  This is usually written as
> segment:offset.  Thus, the two pointers 0100:0010 and 0101:0000 point at
> the same byte.  If compared, they will *NOT* compare equal.

I think they will have to compare equal.  If the compiler generates
naive code to compare them it's wrong.  Just as "!p" MUST be true
after a "p = 0", where p is a char * and the machine uses a NULL pointer
that isn't an all-zero bit pattern, "p == q" MUST be true if they
point at the same address, no matter what they look like internally.

> Seems like when K & R says that pointer comparison is undefined except
> when the two are pointers to the same array, it should be taken to mean
> exactly that for all operators, including == and !=.

I think you want to re-phrase this.  If this were literally true then
"p = q; if (p == q) ..." would have undefined semantics.

This is sort of a moot point anyway, since presumably there is no way that
you could get a "de-normalized" pointer within C without using backdoor
casting of values...  But you're just asking for trouble if you ignore the
possibility.

	Wayne

john@viper.UUCP (03/02/87)

In article <2000@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
 >>john@viper.UUCP (John Stanley) writes:
 >>> The byte of memory addressed by q[0] and p[17] would
 >>> be the exact same byte, ...............
 >
 >Actually, of course, it is p[16] which is equal to q[0], and &p[17] which is
 >not a correct pointer.  Isn't zero-based array indexing wonderful? (:-)

But Frank, haven'y you heard?  Intel is switching all future 80x86 chips
to seperate their segments by 17 bytes to expand the addressing space....

  (just kiding... ;^)

Frank is, of course, correct...  (boy, is my face red...)

--- 
John Stanley (john@viper.UUCP)
Software Consultant - DynaSoft Systems
UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john