nw@vaxine.UUCP (02/11/87)
Consider the following C function:
same_char (p, q)
char *p;
char *q;
{
return (p == q);
}
Does this function only return a non-zero value when p and q point
to the same physical character? This may seem like a silly question,
but I haven't found an iron-clad answer in K & R yet. I quote:
K & R page 98 (emphasis mine):
If p and q point to members of the SAME ARRAY, then relations
like <, >=, etc., work properly.
p < q
is true, for example, if p points to an earlier member of the array
than does q. The relations == and != also work.
It goes on to say that all bets are off for comparisons of pointers between
different arrays. Thus, if p and q pointed inside different character arrays,
it would appear that comparison between those pointers, even for equality,
is undefined.
Obviously (?) the intent of the warning was that one should not assume
anything about the ordering of distinct objects in memory. Unfortunately,
I don't see any guarantee that every object has a unique address as
defined by the pointer comparison operation.
Imagine a hypothetical segmented machine where every C object were
allocated in its own separate segment. Essentially, my question
boils down to this: On such a machine, must pointer comparison
be implemented by comparing both segment base and offset within
segment, or is it allowable to simply compare offset within segment?
I certainly wouldn't advocate such an implementation, but I'm disturbed that
I can't seem to rule it out.
Neil Webber Automatix Inc (or current resident) Billerica MA
{decvax,allegra}!encore!vaxine!nw
pinkas@mipos3.UUCP (02/13/87)
In article <416@vaxine.UUCP> nw@vaxine.UUCP (Neil Webber) writes: >Consider the following C function: > > same_char (p, q) > char *p; > char *q; > { > return (p == q); > } > >Does this function only return a non-zero value when p and q point >to the same physical character? This may seem like a silly question, >but I haven't found an iron-clad answer in K & R yet. I quote: ... >It goes on to say that all bets are off for comparisons of pointers between >different arrays. Thus, if p and q pointed inside different character arrays, >it would appear that comparison between those pointers, even for equality, >is undefined. ... True. Consider the 80x86 achitecture and many of the existing C ompilers for it. Many of them try to optimize in memory models with large data segments by using as small an offset as possible for the first element of the array. (This is a very small optimization, but it does save some comparisons if the array is known to be smaller than a certain size.) For example: char p[16], q[16]; might let p = 0x1000:0000 and q = 0x1001:0000. If your C compiler does not do the necessary conversion to common segments, or a similar calculation, the comparison will fail. Many early C compiler for the 8086 treated pointers as ints for comparison and long (segmented) pointers as longs. Thus with the above example, the pointers would be considered to point to objects 64k apart. The proper calculation is (seg * sigsize + offset). If this calculation is used, the above arrays are 16 bytes apart (on the 8086 at least), and p[16] is identical (same location) as q[0]. Considering that K&R specically said that only pointers to the same array should be compared, and the fact that many C compilers take K&R to be the final word on everything, I would say that unless your compiler manual states otherwise (or it works), you should avoid comparing pointers to different arrays. -Israel -- User (n.): A programmer who will believe anything you tell him. ---------------------------------------------------------------------- UUCP: {amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!pinkas ARPA: pinkas%mipos3.intel.com@relay.cs.net CSNET: pinkas%mipos3.intel.com
tps@sdchem.UUCP (02/15/87)
In article <454@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes: >In article <416@vaxine.UUCP> nw@vaxine.UUCP (Neil Webber) writes: >>Consider the following C function: >> >> same_char (p, q) >> char *p; >> char *q; >> { >> return (p == q); >> } >> >>Does this function only return a non-zero value when p and q point >>to the same physical character? This may seem like a silly question, >>but I haven't found an iron-clad answer in K & R yet. I quote: >... >True. Consider the 80x86 achitecture and many of the existing C ompilers >for it. Many of them try to optimize in memory models with large data >segments by using as small an offset as possible for the first element of >the array. (This is a very small optimization, but it does save some >comparisons if the array is known to be smaller than a certain size.) For >example: > > char p[16], q[16]; >... >Considering that K&R specically said that only pointers to the same array >should be compared, and the fact that many C compilers take K&R to be the >final word on everything, I would say that unless your compiler manual >states otherwise (or it works), you should avoid comparing pointers to >different arrays. >-Israel I would say that a function call which compared its two pointer arguments would *have* to work, no matter how K&R are interpreted on this point, or else strcpy wouldn't work. Proof: Using your declaration as a basis, declare char p[16]; char q[] = "stRing q"; strcpy( p, q ); now (assuming strcpy() is a real function call) look at strcpy(): char * strcpy( s1, s2 ) char *s1, *s2; If (s1 == s2) could be true even if s1 and s2 pointed to different areas, then strcpy would copy s2 on top of itself. That is, even on a segmented architecture, when pointers get passed to a subroutine, they MUST have distinct addresses -- how else can the subroutine know what area of memory to access? "<" and ">" are still another matter. || Tom Stockfisch, UCSD Chemistry tps%chem@sdcsvax.UCSD
guy@gorodish.UUCP (02/16/87)
>In article <454@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes: >>Considering that K&R specically said that only pointers to the same array >>should be compared, To be precise, they said Pointer comparison is portable only when the pointers point to objects in the same array. when discussing "<", ">", "<=", and ">=". Under "Equality operators", they say that they're "exactly analogous" to relational operators. It is unclear whether this was intended to mean that *all* pointer comparisons are portable only when the pointers point to objects in the same array, or just comparisons other than for equality or inequality. I would vote for the latter, since pointer *equality* can be defined as meaning "the two pointers point to the same object. In fact, ANSI X3J11 has already voted for the latter. Under "Relational operators", they say that "If the objects pointed to are not members of the same array, the result is undefined." (Note, however that they also point out that "If P points to the last member of an array object, the pointer expression P+1 is greater than P, even though P+1 does not point to a member of the same array object as P," which means that if P can be the last address in a segment, you can't just naively compare offsets within a segment.) Under "Equality operators", however, they say that "If two pointers to objects or functions compare equal, they point to the same object or function, respectively." In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes: >I would say that a function call which compared its two pointer arguments >would *have* to work, no matter how K&R are interpreted on this point, or >else strcpy wouldn't work. > >That is, even on a segmented architecture, when pointers get passed to a >subroutine, they MUST have distinct addresses -- how else can the subroutine >know what area of memory to access? This isn't necessarily the case. If one takes the first of the two interpretations of K&R, two pointers that pointed to different locations would compare equal. This would be highly undesirable (since people generally expect that two pointers will be equal iff they point to the same object, which is presumably why ANSI ruled out that interpretation), but would conform to the first interpretation of K&R.
john@viper.UUCP (02/17/87)
In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes: > >That is, even on a segmented architecture, when pointers get passed to a >subroutine, they MUST have distinct addresses -- how else can the subroutine >know what area of memory to access? > Tom, on segmented arcitecture machines you -could- have two distinctly different pointer values which point to the same memory space. This is what Israel Pinkas was (I think) trying to point out. In the example he gave: char p[16], q[16]; some compilers would take this and generate array addresses that start in two different memory segments (segments on 80x86 machines are 64k bytes long, but the actual starting memory location for two consecutive segments are only 16 bytes away from each other...) This means the address of &p[0] could be 0001:0000 (segment:offset) and the address of &q[0] could be 0002:0000. Now, given that a compiler -could- allocate and address the two arrays in this manner, the address for the memory location &p[17] (incorrect, but technicaly a legal memory reference) would be 0001:0010. The byte of memory addressed by q[0] and p[17] would be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be different. If the people writing the compiler take this into account, it will require converting all pointers used in pointer comparisons to a normalized form which maps the entire memory space in a linear (one value per memory byte) fashion. Unfortunately, this also will slow down (rather nastily) all operations using pointer comparisons... An undesireable side effect for a "feature" not defined in K&R. Getting back to the original starting question asked by Neil Webber: >>Consider the following C function: >> >> same_char (p, q) >> char *p; >> char *q; >> { >> return (p == q); >> } >> >>Does this function only return a non-zro value when p and q point >>to the same physical character? The answer to the exact wording of the question is YES... However.... saying that the function will ONLY return a non-zero value when the pointers match is not the same as "Does the function ALWAYS return a non-zero value when p and q point to the same physical character?" The answer to the latter question is NO. You can have two different pointer values which point to the same physical character. This is unusual, but it does happen in any instance where you might run into a compiler which allocates memory in the manner I mentioned above. The "solution" that I've used is, in any program I write where this might have an effect, I define a macro PRTEQUAL(x,y). On most machines I can define this to be: #define PTREQUAL(x,y) (x == y) on any machine I port this code to which gives me problems, I can redefine PTREQUAL (or PTRLESS, PTRGREATER, etc) to reference a function which does the math necessary for linear mapping of the memory space. As long as you are referencing addresses within the same structure/array you will not have to worry about this and the "same_char()" function you wrote will work all the time. ------------- john@viper.UUCP <John L. Stanley> Analyst/Consultant - DynaSoft Systems...
tps@sdchem.UUCP (02/19/87)
In article <541@viper.UUCP> john@viper.UUCP (John Stanley) writes: >In article <636@sdchema.sdchem.UUCP> tps@sdchemf.UUCP (Tom Stockfisch) writes: > > > >That is, even on a segmented architecture, when pointers get passed to a > >subroutine, they MUST have distinct addresses -- how else can the subroutine > >know what area of memory to access? > > > > Tom, on segmented arcitecture machines you -could- have two distinctly >different pointer values which point to the same memory space... >... > In the example he gave: > char p[16], q[16]; >... > This means the address of &p[0] could be 0001:0000 (segment:offset) and the >address of &q[0] could be 0002:0000. Now, given that a compiler -could- >allocate and address the two arrays in this manner, the address for the >memory location &p[17] (incorrect, but technicaly a legal memory reference) >would be 0001:0010. The byte of memory addressed by q[0] and p[17] would >be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be >different. &p[17] is illegal (or at least undefined) in C, given the above definition. If I compared this to any other pointer value (even inside p[]) I would not expect to get anything well defined. > Getting back to the original starting question asked by Neil Webber: >>>Consider the following C function: >>> same_char (p, q) >>> char *p; >>> char *q; >>> { >>> return (p == q); >>> } >>>Does this function only return a non-zro value when p and q point >>>to the same physical character? > The answer to the exact wording of the question is YES... However.... >saying that the function will ONLY return a non-zero value when the pointers >match is not the same as "Does the function ALWAYS return a non-zero value >when p and q point to the same physical character?" The answer to the latter >question is NO. Judging from your above comments, I assume that what you mean by this is that if you pass &p[16] and &q[0] to same_char(), they might refer to the same physical memory but have a different value. Since there is no guarantee in C that adjacent definitions wind up adjacent in memory, I would say the result of same_char( &p[16], &q[0] ) would be undefined (by C). It would make no more sense to argue about what same_char() should return in this case than if you had called char *r = p; same_char( r++, r++ ) So I would say that if you pass *defined* pointer values to same_char() it will return 1 if and only if they refer to the same memory location, and will return 0 otherwise. || Tom Stockfisch, UCSD Chemistry tps%chem@sdcsvax.UCSD
Schauble@mit-multics.arpa (02/19/87)
Unfortunately, the Intel 8086 series provides another counter example. On this machine, address are in the form of segment and offset. The Actual Address is 16*segment + offset. This is usually written as segment:offset. Thus, the two pointers 0100:0010 and 0101:0000 point at the same byte. If compared, they will *NOT* compare equal. So, if two pointers compare equal, they definately point at the same object. However, the converse is not true. Two pointers that do not compare equal do NOT (necessarily) point at different objects. Seems like when K & R says that pointer comparison is undefined except when the two are pointers to the same array, it should be taken to mean exactly that for all operators, including == and !=. Paul Schauble at MIT-Multics.arpa
drw@cullvax.UUCP (02/19/87)
john@viper.UUCP (John Stanley) writes: > Tom, on segmented arcitecture machines you -could- have two distinctly > different pointer values which point to the same memory space. This is > what Israel Pinkas was (I think) trying to point out. > > In the example he gave: > char p[16], q[16]; > (Details of 8086 memory addressing omitted.) > The byte of memory addressed by q[0] and p[17] would > be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would be > different. Yes, but comparing &p[17] and &q[0] is comparing two addresses derived from different allocations, and as the standard says, that is implementation defined (i.e., may not work). Note also that &p[17] is a correct *pointer*, but attempting to fetch or store anything through it is not. &p[18] is not a correct pointer. Dale -- Dale Worley Cullinet Software UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw ARPA: cullvax!drw@eddie.mit.edu
guy@gorodish.UUCP (02/19/87)
>Unfortunately, the Intel 8086 series provides another counter example. >On this machine, address are in the form of segment and offset. The >Actual Address is 16*segment + offset. Yes, but *if* you happen to construct two different long pointers that point to the same address, that's just like double-mapping a location with an MMU. >So, if two pointers compare equal, they definately point at the same >object. However, the converse is not true. Two pointers that do not >compare equal do NOT (necessarily) point at different objects. Yes, but this is an escape hatch for the benefit of e.g. systems that have to do double mapping. It is *not* intended to render comparison of pointers for equality useless except when comparing pointers to elements in the same array. >Seems like when K & R says that pointer comparison is undefined except >when the two are pointers to the same array, it should be taken to mean >exactly that for all operators, including == and !=. Yes, but the ANSI C standard explicitly separates pointer comparison for (in)equality from relational comparison on pointers. Anybody who tried to sell *me* a C implementation where the same object had two addresses *except* when something was explicitly doubly mapped, or in similarly unusual cases (e.g. an 808[68] with no memory mapping, where a segment, treated as a full 64KB segment, overlapped another segment, but where the segment *really* isn't long enough to overlap it) - i.e., an implementation where pointer equality wasn't equivalent to object equality, except in some *very* specialized and *explicitly*-documented cases - would be shown the door rather quickly.
pes@bath63.UUCP (02/20/87)
Well, Multics itself provides the most dangerous of both worlds in that you can get: Two pointers of (to all determinable checks) *look* different but which in fact point to the same thing (because the program has, through some cutesy trick, managed to get the same 'physical' segment initiated twice, with 2 different 'logical' segment numbers. Two pointers which (as near as can be determined) are identical, but which the program 'thinks' (and which are 'supposed to') point at different things (because the program has gotten a pointer to some 'object', and then managed to terminate the 'physical' segment, and initiate a new segment, with the same 'logical' segment number. Admittedly, both of these effects require some fairly shady programming, but unfortunately that's not unheard of. They can also be achieved fairly easily on large projects involving several programmers and a communication breakdown.
john@viper.UUCP (02/22/87)
Both Tom Stockfisch and Dale Worley make a good point. The method I used to construct the problem pointers is certanly illegal in standard C. My response was primarily to the people talking about how the function in question to compare two pointers could fail on a 80x86 arcitecture machine. I guess I should have made the orientation of my answer a bit more clear. Since some pointers used by programs in an MS-DOS environment are created by the operating system and not from C it is possible to have illegally constructed pointers. The original question from Neil Webber asked if the function would return TRUE if the two pointers pointed to the same "physical" character... The side-track of 80x86 pointers carried this a bit beyond questions limited only to "standard" C and I guess the failure cases will only occur when a programmer goes a bit beyond what is legal, (and I'm sure -none- of us would -ever- do anything in C that wasn't by-the-book... ;-) --- John Stanley (john@viper.UUCP) Software Consultant - DynaSoft Systems UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john
franka@mmintl.UUCP (02/23/87)
In article <814@cullvax.UUCP> drw@cullvax.UUCP writes: >john@viper.UUCP (John Stanley) writes: >> In the example he gave: >> char p[16], q[16]; >> (Details of 8086 memory addressing omitted.) >> The byte of memory addressed by q[0] and p[17] would >> be the exact same byte, but the two pointers 0002:0000 and 0001:0010 would >> be different. > >Yes, but comparing &p[17] and &q[0] is comparing two addresses derived >from different allocations, and as the standard says, that is >implementation defined (i.e., may not work). Note also that &p[17] is >a correct *pointer*, but attempting to fetch or store anything through >it is not. &p[18] is not a correct pointer. Actually, of course, it is p[16] which is equal to q[0], and &p[17] which is not a correct pointer. Isn't zero-based array indexing wonderful? (:-) Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
faustus@ucbcad.UUCP (02/25/87)
In article <4537@brl-adm.ARPA>, Schauble@mit-multics.arpa (Paul Schauble) writes: > Unfortunately, the Intel 8086 series provides another counter example. > On this machine, address are in the form of segment and offset. The > Actual Address is 16*segment + offset. This is usually written as > segment:offset. Thus, the two pointers 0100:0010 and 0101:0000 point at > the same byte. If compared, they will *NOT* compare equal. I think they will have to compare equal. If the compiler generates naive code to compare them it's wrong. Just as "!p" MUST be true after a "p = 0", where p is a char * and the machine uses a NULL pointer that isn't an all-zero bit pattern, "p == q" MUST be true if they point at the same address, no matter what they look like internally. > Seems like when K & R says that pointer comparison is undefined except > when the two are pointers to the same array, it should be taken to mean > exactly that for all operators, including == and !=. I think you want to re-phrase this. If this were literally true then "p = q; if (p == q) ..." would have undefined semantics. This is sort of a moot point anyway, since presumably there is no way that you could get a "de-normalized" pointer within C without using backdoor casting of values... But you're just asking for trouble if you ignore the possibility. Wayne
john@viper.UUCP (03/02/87)
In article <2000@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >>john@viper.UUCP (John Stanley) writes: >>> The byte of memory addressed by q[0] and p[17] would >>> be the exact same byte, ............... > >Actually, of course, it is p[16] which is equal to q[0], and &p[17] which is >not a correct pointer. Isn't zero-based array indexing wonderful? (:-) But Frank, haven'y you heard? Intel is switching all future 80x86 chips to seperate their segments by 17 bytes to expand the addressing space.... (just kiding... ;^) Frank is, of course, correct... (boy, is my face red...) --- John Stanley (john@viper.UUCP) Software Consultant - DynaSoft Systems UUCP: ...{amdahl,ihnp4,rutgers}!{meccts,dayton}!viper!john