meissner@xyzzy.UUCP (Michael Meissner) (01/01/70)
In article <13676@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes: > More correctly stated that if your the difference between two pointers is > ever more than that that can be represented by a long, you are in trouble. > > -Ron Actually, to be more precise, if the difference between two pointers which point to members of the SAME array is ever more than that which can be represented by a long, you are in trouble. All of the standards say that pointer subtraction is only defined within an aggregate. This allows putting each top level item into a separate segment on say an 80*86, and only doing the subtraction between the two offsets. Many MSDOS compilers do this already. -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner
simon@its63b.ed.ac.uk (Simon Brown) (06/22/87)
If I were to want to implement malloc (or some such) on a machine where sizeof(int) != sizeof(char *), how do I ensure that the pointer-values I return are maximally aligned (eg, quad-aligned)? If sizeof(int)==sizeof(char *), then I can cast the pointer to an int, do whatever arithmetic stuff is required to it to get it to be aligned, then cast it back again - but of course this won't work if information is lost by either of the casts. Any hints? (BTW, It's not really malloc I'm dealing with, I just lied about that one) %{ Simon! %} -- ---------------------------------- | Simon Brown | UUCP: seismo!mcvax!ukc!its63b!simon | Department of Computer Science | JANET: simon@uk.ac.ed.its63b | University of Edinburgh, | ARPA: simon%its63b.ed.ac.uk@cs.ucl.ac.uk | Scotland, UK. | ---------------------------------- "Life's like that, you know"
blarson@castor.usc.edu (Bob Larson) (07/05/87)
In article <493@its63b.ed.ac.uk> simon@its63b.ed.ac.uk (Simon Brown) writes: >If I were to want to implement malloc (or some such) on a machine where >sizeof(int) != sizeof(char *), how do I ensure that the pointer-values I >return are maximally aligned (eg, quad-aligned)? The same way as you would on any other machine: non-portably. (What is your definition of quad-aligned? 4 * sizeof(char)? There are quit a few machines where this is not maximally aligned.) For example, prime 64v mode: char *alignpointer(p) char *p; { union { char *up; struct { unsigned fault:1; unsigned ring:2; /* I may have the ring and extend bits exchanged */ unsigned extend:1; /* check before you try this on a real prime */ unsigned segment:12; unsigned offset:16; unsigned bit:4; unsigned unused:12; } point; } un; int zerooffset; un.up = p; if(un.point.fault) return p; /* faulted pointer, not a valid address */ zerooffset = un.point.offset == 0; un.point.offset = (un.point.offset | (un.point.extend & (un.point.bit!=0)) + 1) & ~1; /* round offset up to next 4 byte boundry */ un.point.extend = 0;/* and say it is at a 2-byte boundry */ un.point.bit = 0; /* unneded, but leave it clean */ if(un.point.offset == 0 && !zerooffset) un.point.segment++; return un.up; } Obviously, it would be easier to make sure to generate aligned pointers in the first place. Also I did not make all the assumptions that the C compiler does, assuming you could have gotten the pointer via another language. Bob Larson Arpa: Blarson@Ecla.Usc.Edu Uucp: {sdcrdcf,seismo!cit-vax}!oberon!castor!blarson "How well do we use our freedom to choose the illusions we create?" -- Timbuk3
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/05/87)
In article <493@its63b.ed.ac.uk> simon@its63b.ed.ac.uk (Simon Brown) writes: >If I were to want to implement malloc (or some such) on a machine where >sizeof(int) != sizeof(char *), how do I ensure that the pointer-values I >return are maximally aligned (eg, quad-aligned)? If sizeof(int)==sizeof(char *), >then I can cast the pointer to an int, do whatever arithmetic stuff is >required to it to get it to be aligned, then cast it back again - but of >course this won't work if information is lost by either of the casts. First, do most of your arithmetic on (char *) data types, not on (int)s. Second, forcing alignment may require converting your pointers to integral types to do the rounding operations. (long) is appropriate for portable code. (If a (char *) won't fit into a (long), you have real problems!) Third, it is difficult to portably determine alignment requirements. Consider using something like the following: struct align { char c0; union { long l1[2]; double d1[2]; char *cp1[2]; union { long l2[2]; double d2[2]; char *cp2[2]; } u1[2]; } u0; } a; #define ALIGN ((char *)&a.u0 - (char *)&a.c0) (This example can probably be improved.)
lm@cottage.WISC.EDU (Larry McVoy) (07/06/87)
In article <6061@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >integral types to do the rounding operations. (long) is appropriate >for portable code. (If a (char *) won't fit into a (long), you have >real problems!) I'm not sure this is true anymore. Don't some supercomputers make longs 32 bits, long longs 64 bits, and have addresses > 32 bits and < 64 bits? I seem to remember that someone said something like that recently. Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy
fu@hc.DSPO.GOV (Castor L. Fu) (07/06/87)
In article <3812@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >In article <6061@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>integral types to do the rounding operations. (long) is appropriate >>for portable code. (If a (char *) won't fit into a (long), you have >>real problems!) > >I'm not sure this is true anymore. Don't some supercomputers make >longs 32 bits, long longs 64 bits, and have addresses > 32 bits and < 64 bits? >I seem to remember that someone said something like that recently. > >Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy Well, I am not positive about how the C compiler is organized, (who wants to use a compiler which can barely vectorize on a cray?) However, the FORTRAN compiler's primary data type for integers is 64 bits wide. Internally, the addressing registers are only 24 bits wide. (The machine has no virtual memory, and 24 bits addresses 16 megawords which is still 128Mbytes, so the need for 32 bit or 64 bit addressing is questionable.) Anyways this has lead to much grief for myself when I found library routines which never expected to see things bigger than 8 Megwords (since the integers are signed.). So I guess the moral of the story is that sizeof ( char *) < sizeof(int) is also quite possible in some wierd implementations. -Castor Fu fu@hc.dspo.gov
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/06/87)
In article <3812@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: -In article <6061@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: ->integral types to do the rounding operations. (long) is appropriate ->for portable code. (If a (char *) won't fit into a (long), you have ->real problems!) - -I'm not sure this is true anymore. Don't some supercomputers make -longs 32 bits, long longs 64 bits, and have addresses > 32 bits and < 64 bits? -I seem to remember that someone said something like that recently. What's a (long long)? We were talking about portable code!
lm@cottage.WISC.EDU (Larry McVoy) (07/06/87)
I sez: I'm not sure this is true anymore. Don't some supercomputers make longs 32 bits, long longs 64 bits, and have addresses > 32 bits and < 64 bits? I seem to remember that someone said something like that recently. Doug sez: What's a (long long)? We were talking about portable code! A long long is a kludge. However, I seem to remember that it went something like this: a company was doing unix on a Amdahl (???) and the unix people were really used to (xxx *) == 32 bits and (long) == 32 bits, and having it otherwise broke all sorts of code. So they gave people short, int, long, and long long. Yeah, it's gross. But so was defining C in such an ambiguous way. It's really time for int8 int16 int32 int64 or some such attempt at defining sizes with the type. Larry McVoy lm@cottage.wisc.edu or uwvax!mcvoy
karl@haddock.UUCP (Karl Heuer) (07/07/87)
In article <3812@spool.WISC.EDU> lm@cottage.WISC.EDU (Larry McVoy) writes: >In article <6061@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn) writes: >>(long) is appropriate for portable code. (If a (char *) won't fit into a >>(long), you have real problems!) Hasn't ANSI removed all pretense of pointers being integerizable? >I'm not sure this is true anymore. Don't some supercomputers make >longs 32 bits, long longs 64 bits, and have addresses > 32 bits and < 64 bits? >I seem to remember that someone said something like that recently. Probably my article, which was hypothetical. I was less concerned with the cast of pointer to int, which is nonportable anyway, than with the kosherness of having size_t and ptrdiff_t be larger than unsigned long. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
ron@topaz.rutgers.edu.UUCP (07/08/87)
That is hideous. I don't know what supercomputer you are referring to but Crays have ints and longs both at 64 bits. There are no super-longs. When we did the compilers for the HEP Supercomputer (64 bit words), we opted for 16 bit shorts, 64 bit ints, and 64 bit longs. There is one more hardware supported type (half words-32 bits). Avoiding things that would really warp the language such as short long ints or long short ints, and realizing that we really wanted int to be 64 bits (the convenient size as stated in K&R and the standards), we settled for a seperate "hidden" type that we try to avoid using except when necessary. It was called _int32, though the term "medium int" did come up in discussion. By the way, it was a real pain hacking pcc to do the extra int type. -Ron
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (07/10/87)
In article <13218@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
: That is hideous. I don't know what supercomputer you are referring
: to but Crays have ints and longs both at 64 bits. There are no super-longs.
: When we did the compilers for the HEP Supercomputer (64 bit words),
: we opted for 16 bit shorts, 64 bit ints, and 64 bit longs. There is
: one more hardware supported type (half words-32 bits). Avoiding things...
Why not have int be 32 bits? That fits the requirement that
length char<=short<=int<=long. Not a comment, just a question...
--
bill davidsen (wedu@ge-crd.arpa)
{chinet | philabs | sesimo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/12/87)
In article <6655@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: >Why not have int be 32 bits? That fits the requirement that >length char<=short<=int<=long. Not a comment, just a question... There are two main considerations for the correct size to be used for (int) when implementing C on a new system: 1. (int) objects should be accessible quickly. On a word-addressed architecture, this argues for making them full words. 2. (int)s must be usable for indexing arrays. Depending on the address space, one may have to either impose an artificial limit on array sizes or else make (int)s longer than they might have been. For example, on a hypothetical PDP-11AX (which doesn't exist because it turned into a VAX), one could have had 16 bits continue to be the natural integer data size but 24 or 32 bits could have been the preferred pointer size due to an extended addressing scheme using base registers a la Gould. The C implementor would almost certainly have wanted to make the larger address space available on such a machine, which would force some sort of accommodation to be made for indexing char arrays -- probably by making (int)s as wide as char pointers. I understand from hearsay that the IBM PC world (actually the Intel 8086 world) ran against this very problem, and instead of making a single sane choice they ended up proliferating a variety of incompatible sets of choices (hilariously called "models"). One hopes that a lesson was learned, but I doubt it.
ron@topaz.rutgers.edu (Ron Natalie) (07/13/87)
> : When we did the compilers for the HEP Supercomputer (64 bit words), > : we opted for 16 bit shorts, 64 bit ints, and 64 bit longs. There is > : one more hardware supported type (half words-32 bits). Avoiding things... > Why not have int be 32 bits? That fits the requirement that > length char<=short<=int<=long. Not a comment, just a question... Because "int" is supposed to be a convenient size. The convenient size for us is 64 bits. Since the largest number of variables are type "int" you want to use something pretty efficient (like the word size). By they way, you assumption that type "char" has some guaranteed relationship to any of the integer types is wrong, although anyone who has "char"s that aren't exactly eight bits is likely to cause many applications to die. -Ron
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (07/15/87)
In article <6110@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <6655@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: >>Why not have int be 32 bits? That fits the requirement that >>length char<=short<=int<=long. Not a comment, just a question... > >There are two main considerations for the correct size to be used for (int) >when implementing C on a new system: > >1. (int) objects should be accessible quickly. On a word-addressed >architecture, this argues for making them full words. [ I thought you mentioned that the 32 bit size was hardware supported. On many machines the short math is faster than long (ie. vax, 68000). ] > >2. (int)s must be usable for indexing arrays. Depending on the address >space, one may have to either impose an artificial limit on array sizes >or else make (int)s longer than they might have been. For example, on [ The 32 bit size allows an acceptable range as a subscript, although at some point 4GB won't be enough, most of the problems using big memory are also using multiple arrays less than 2GB. ] >I understand from hearsay that the IBM PC world (actually the Intel 8086 >world) ran against this very problem, and instead of making a single sane >choice they ended up proliferating a variety of incompatible sets of >choices (hilariously called "models"). One hopes that a lesson was learned, >but I doubt it. That's the point I was making in my posting... the problems occur when the int won't hold an address, and then mainly because some <deleted> is playing fast & loose with bit fidling in pointers or some such. The major problems with "models" would go away if someone made the large model int the same length as the large model pointer. I've been fighting with this in pathalias, trying to get it to run on an 80*86 machine, and finding that (a) it does all its own memory allocation, and (b) it uses ints to hold addresses while doing it. This kind of non-portable code will fail on machines which are not byte addressed, and which use a pointer which looks like a word address and character offset. X3J11 covered this very well, pointers are not forced to be the size of int, they are not even the size of each other! Code written for large model 80*86 will almost always run on any other machine, assuming that it doesn't use calls to the hardware, etc. -- bill davidsen (wedu@ge-crd.arpa) {chinet | philabs | sesimo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
throopw@xyzzy.UUCP (Wayne A. Throop) (07/28/87)
> karl@haddock.UUCP (Karl Heuer) >> lm@cottage.WISC.EDU (Larry McVoy) >>> gwyn@brl.arpa (Doug Gwyn) >>>(long) is appropriate for portable code. (If a (char *) won't fit into a >>>(long), you have real problems!) I am aware of a seriously developed architecture where "long" was 64 bits, and pointers were 128 bits. That is, arithmetic could be performed on binary integers up to 64 bits long by the CPU, but pointers had considerable extra information beyond offset information. In particular, there was a universal, shared, access-protected, segmented address space. It would have been natural to make shorts either 16 bits or 32 bits, ints 32 bits, and longs 64 bits, which is quite vanilla. The odd thing would have been that pointers wouldn't fit into any of those. But all in all, a very lovely machine. And yes, much C code would have been hard to port to this machine, or the compiler would have had to stand on its head and spin about 48 hula-hoops on its toes to make the usual assumptions that many C programmers make about the underlying hardware seem to be true. Sadly, it is unlikely that this architecture will haunt C implementors or programmers. The current fashion in computer architecture has moved away from many of the concepts it embodied. Sigh. >> Don't some supercomputers make longs 32 bits, long longs 64 bits, and >> have addresses > 32 bits and < 64 bits? >>I seem to remember that someone said something like that recently. > Probably my article, which was hypothetical. I was less concerned with the > cast of pointer to int, which is nonportable anyway, than with the kosherness > of having size_t and ptrdiff_t be larger than unsigned long. Ah. The architecture I had in mind does not have these problems. Of course, many C programmers assume that any two non-null poiners of the same type can be subtracted, which isn't the case for this architecture. -- What!!?? What is it!!?? Did they find Jimmy Hoffa under Tammy Bakker's makeup? --- from Bloom County -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
mark@applix.UUCP (Mark Fox) (07/29/87)
In article <161@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes: % %I am aware of a seriously developed architecture where "long" was 64 %bits, and pointers were 128 bits... In %particular, there was a universal, shared, access-protected, segmented %address space... But all in all, a very lovely machine... Sadly, %it is unlikely that this architecture will haunt C implementors or %programmers. The current fashion in computer architecture has moved %away from many of the concepts it embodied. Sigh. >-- >Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw Ahh, DG's unforgettable FHP machine. What a dream that was. :-) -- Mark Fox Applix Inc., 112 Turnpike Road, Westboro, MA 01581, (617) 870-0300 uucp: seismo!harvard!m2c!applix!mark
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/30/87)
In article <161@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes:
->>> gwyn@brl.arpa (Doug Gwyn)
->>>(long) is appropriate for portable code. (If a (char *) won't fit into a
->>>(long), you have real problems!)
-I am aware of a seriously developed architecture where "long" was 64
-bits, and pointers were 128 bits.
Yup, you notice the dpANS for C doesn't guarantee that there will be an
integral type able to hold a pointer without loss of information. It
does give rules for such a feature if it happens to be implemented, however.
ron@topaz.rutgers.edu (Ron Natalie) (08/04/87)
More correctly stated that if your the difference between two pointers is ever more than that that can be represented by a long, you are in trouble. -Ron
dhesi@bsu-cs.UUCP (Rahul Dhesi) (08/07/87)
In article <179@xyzzy.UUCP> meissner@nightmare.UUCP (Michael Meissner) writes: >All of the standards say that >pointer subtraction is only defined within an aggregate. This allows putting >each top level item into a separate segment on say an 80*86, and only doing >the subtraction between the two offsets. Many MSDOS compilers do this >already. To subtract two independent large-model pointers of the type segment:offset, I tried this: (unsigned long) p2 - (unsigned long) p1 I was hoping that the cast to unsigned long would convert each pointer to a sort of absolute memory address in bytes, and the subtraction would yield the difference in bytes. Under Borland's Turbo C at least, such a cast is a no-op, so the resulting unsigned long does not necessarily increase monotonically with increasing memory address to which the original pointer points. I understand that the requirement on such casts is that they be unsurprising and reversible, to the extent that these are possible. It would be nice if "unsurprising" were interpreted to mean that the subtraction I was attempting would work. The only catch is that reversibility would be weakened because in the 8086 architecture many different long pointers can point to the same address, but I could live with that. -- Rahul Dhesi UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi
meissner@xyzzy.UUCP (Michael Meissner) (08/26/87)
In article <934@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
#
# To subtract two independent large-model pointers of the type
# segment:offset, I tried this:
#
# (unsigned long) p2 - (unsigned long) p1
#
# I was hoping that the cast to unsigned long would convert each pointer
# to a sort of absolute memory address in bytes, and the subtraction
# would yield the difference in bytes. Under Borland's Turbo C at least,
# such a cast is a no-op, so the resulting unsigned long does not
# necessarily increase monotonically with increasing memory address to
# which the original pointer points.
This is bad practice. I know of machines that have different formats for
pointers to words and pointers to bytes, and other machines that use things
like bit pointers. In none of these cases, or a segmented machine like the
80*86 will subtraction give you what you want. This is yet another symptom
of the world is not a VAX syndrome.
--
Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner
Arpa/Csnet: meissner@dg-rtp.DG.COM
ed@mtxinu.UUCP (Ed Gould) (08/28/87)
># To subtract two independent large-model pointers of the type ># segment:offset, I tried this: ># ># (unsigned long) p2 - (unsigned long) p1 ># > >This is bad practice. It's also not legal in the proposed ANSI C standard. Pointers may be subtracted *only* if they point to members of the same array of elements. Casting them has no real effect on a byte- addressed machine; it's not at all obvious what it should do on other machines. -- Ed Gould mt Xinu, 2560 Ninth St., Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146 "A man of quality is not threatened by a woman of equality."
randy@umn-cs.UUCP (Randy Orrison) (08/28/87)
In article <483@mtxinu.UUCP> ed@mtxinu.UUCP (Ed Gould) writes: >It's also not legal in the proposed ANSI C standard. Pointers >may be subtracted *only* if they point to members of the same >array of elements. How is this determined? example: int strlen(s) char *s; { register char *c; c = s; while(c++) ; return (c-s); } How does anything know if s & c are pointing to members of the same array? If s isn't 0 terminated, c could end up anywhere... (No flames on off-by-one errors, or any design issues. this is just an example) -randy -- Randy Orrison, University of Minnesota School of Mathematics UUCP: {ihnp4, seismo!rutgers!umnd-cs, sun}!umn-cs!randy ARPA: randy@ux.acss.umn.edu (Yes, these are three BITNET: randy@umnacvx different machines)
jc@minya.UUCP (John Chambers) (08/29/87)
In article <483@mtxinu.UUCP>, ed@mtxinu.UUCP (Ed Gould) writes: > ># To subtract two independent large-model pointers of the type > ># segment:offset, I tried this: > ># > ># (unsigned long) p2 - (unsigned long) p1 > ># > > > >This is bad practice. > > It's also not legal in the proposed ANSI C standard. Pointers > may be subtracted *only* if they point to members of the same > array of elements. Huh? This example isn't subtracting pointers to anything. It is subtracting two unsigned longs. I sure hope that's defined. I also hope that the ANSI standards haven't done THAT much damage to C semantics! (:-) -- John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
guy%gorodish@Sun.COM (Guy Harris) (08/29/87)
> How does anything know if s & c are pointing to members of the same array? > If s isn't 0 terminated, c could end up anywhere... If "s" isn't 0 terminated, the result returned from "strlen" isn't meaningful anyway! As such, the fact that "c" might not be in the same array is hardly relevant. The rules don't say that the implementation MUST detect whether the two pointers belong to the same array, and slap your wrists if they aren't; they say that the behavior is *undefined* if the pointers aren't members of the same array! As such, nobody *has* to know if "s" and "c" are pointing to members of the same array. In any *valid* call to "strlen", the pointers will be members of the same array: 1) It could be a string constant, which is an array; 2) It could be an object declared as an array; 3) It could be an array allocated by "malloc", or an array that is a component of an object allocated by "malloc". In all *these* cases, if the array contains a valid string, the call to "strlen" must return a meaningful result, and your sample code for "strlen" will subtract two pointers that point to members of the same array, or a pointer that points to a member of an array from a pointer one past the end of that array, both of which are valid. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/29/87)
In article <2130@umn-cs.UUCP>, randy@umn-cs.UUCP (Randy Orrison) writes: > In article <483@mtxinu.UUCP> ed@mtxinu.UUCP (Ed Gould) writes: > >Pointers may be subtracted *only* if they point to members of the same > >array of elements. > How is this determined? example: > strlen(s) > return (c-s); Obviously all characters in a string are in the same object (be it (char []) or chunk of malloc()-allocated storage. I don't recall if the latter is covered by the draft proposed standard but it should be. If some code violates the same-aggregate pointer constraint, the behavior is unspecified. It might work or it might not. No portable program should violate the constraint.
cik@l.cc.purdue.edu (Herman Rubin) (08/29/87)
In article <483@mtxinu.UUCP>, ed@mtxinu.UUCP (Ed Gould) writes: > It's also not legal in the proposed ANSI C standard. Pointers > may be subtracted *only* if they point to members of the same > array of elements. The fact that some `gurus' cannot see the uses of this construct, as well as others such as goto's, forcing inline, etc., is no more appropriate than prohibiting the use of any tools developed since 1800 to sculptors. You have no way of knowing how I can use the power of the machine; I may very well find a new way of doing some things tomorrow that I do not see today. Let us remove unnecessary restrictions from the languages. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet
barmar@think.COM (Barry Margolin) (08/30/87)
In article <572@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <483@mtxinu.UUCP>, ed@mtxinu.UUCP (Ed Gould) writes: > >> It's also not legal in the proposed ANSI C standard. Pointers >> may be subtracted *only* if they point to members of the same >> array of elements. > >The fact that some `gurus' cannot see the uses of this construct, as well >as others such as goto's, forcing inline, etc., is no more appropriate >than prohibiting the use of any tools developed since 1800 to sculptors. >You have no way of knowing how I can use the power of the machine; I may >very well find a new way of doing some things tomorrow that I do not see >today. Let us remove unnecessary restrictions from the languages. This is not an unnecessary restriction. It is there because the construct is non-portable, and the purpose of the C standard (indeed, ANY language standard) is to define a language in which portable programs may be written. No one is prohibiting you from subtracting pointers to your heart's delight on machines where it makes sense; just be aware that the standard doesn't specify what the result will be, so your program may behave differently on different architectures. In fact, I know of an architecture where it may return different results for pointers to the same two objects at different times: the Symbolics Lisp Machine. It has a garbage collector that moves objects around in memory, so the addresses may change, and therefore the difference may change. (Note: I've never used their C compiler, so I don't know it will do this; however, I also believe that their architecture allows them to detect comparisons of pointers to different arrays). --- Barry Margolin Thinking Machines Corp. barmar@think.com seismo!ththers' arta
allbery@ncoast.UUCP (08/30/87)
As quoted from <2130@umn-cs.UUCP> by randy@umn-cs.UUCP (Randy Orrison): +--------------- | In article <483@mtxinu.UUCP> ed@mtxinu.UUCP (Ed Gould) writes: | >It's also not legal in the proposed ANSI C standard. Pointers | >may be subtracted *only* if they point to members of the same | >array of elements. | | How is this determined? example: [deleted. ++bsa] | How does anything know if s & c are pointing to members of the same array? | If s isn't 0 terminated, c could end up anywhere... +--------------- I think that they mean that the result is only defined if the pointers are pointing to members of the same structure; in any other situation, you may get a number result but it may not have any meaning. -- Brandon S. Allbery, moderator of comp.sources.misc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>> ** Site "cwruecmp" has changed its name to "mandrill". Please re-address ** *** all mail to ncoast to pass through "mandrill" instead of "cwruecmp". ***
allbery@ncoast.UUCP (08/30/87)
As quoted from <572@l.cc.purdue.edu> by cik@l.cc.purdue.edu (Herman Rubin): +--------------- | In article <483@mtxinu.UUCP>, ed@mtxinu.UUCP (Ed Gould) writes: | | > It's also not legal in the proposed ANSI C standard. Pointers | > may be subtracted *only* if they point to members of the same | > array of elements. | | The fact that some `gurus' cannot see the uses of this construct, as well | as others such as goto's, forcing inline, etc., is no more appropriate | than prohibiting the use of any tools developed since 1800 to sculptors. +--------------- Sure -- but, while your program may work fine on a Vax or a Sun, will it work on a Cray-1? LLNL's S-1? The ANSI C standard defines *portable* code; you can code something that works on your machine but doesn't conform, but don't expect it to work on every machine. (Example: the difference between two pointers not both associated with the same array may be meaningless on a tagged architecture, and may result in either a garbage result or a memory fault.) Subtracting pointers is a different kind of restriction from the use of "goto"; the latter is a *stylistic* restriction, the former is a *portability* restriction. -- Brandon S. Allbery, moderator of comp.sources.misc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>> ** Site "cwruecmp" has changed its name to "mandrill". Please re-address ** *** all mail to ncoast to pass through "mandrill" instead of "cwruecmp". ***
root@hobbes.UUCP (08/31/87)
+---- Herman Rubin writes in <572@l.cc.purdue.edu> ---- | +---- Ed Gould writes ---- | | It's also not legal in the proposed ANSI C standard. Pointers | | may be subtracted *only* if they point to members of the same | | array of elements. | +---- | You have no way of knowing how I can use the power of the machine; I may | very well find a new way of doing some things tomorrow that I do not see | today. Let us remove unnecessary restrictions from the languages. +---- *** The following is only valid on intel 808x architecture machines *** Followups are directed to comp.sys.intel On the intel chips (and I'm sure on many others) some compiler's malloc() routines align memory requests on 16 byte boundries. So, if you did: You might get: _________ char *p1, *p2, *p3; /________/| p1 = malloc(20); p1 -->|20 bytes|| p2 = malloc(20); +--------+/ p3 = p2 - p1; _________ /________/| filler |? bytes || +--------+/ _________ /________/| p2 -->|20 bytes|| +--------+/ and p2 - p1 would NOT give you a useful number! THAT is why ANSI said that the result was undefined. Not illegal, just undefined. This means that compiler writers can do stuff like this without having to worry about breaking code. Iff you know what your compiler does AND iff you don't care about portability then you can use the info like this: printf("On this machine there are %ld bytes of filler between p1 and p2\n", (unsigned long) ( (unsigned long)p2 - (unsigned long)p1 ) - 20); or somesuch. ( This code WILL NOT WORK on intel chips. See below) -- New Subject: pointer manipulation on intel chips -- Note: This DOES NOT pertain to the usual "*(a+3)" or "if (p1 == p2)" stuff which is called "pointer arithmetic" or "pointer manipulation" in languages like C. It instead refers to "dissecting" the value of "&foobar". This comes in when you wish to do things like the p3 = p2 - p1; above where p1 and p2 point to different aggregates. The C compiler already takes care of the first cases for you. If you wish to do pointer manipulation on the intel 808x chips you need to recognize how a pointer is constructed: A pointer has 2 parts, a SEGMENT and an OFFSET, each 16 bits in length. e.g.: 1040:3333 SEGMENT:OFFSET In the "small" model, the SEGMENT is an unchanging value stored in a register and the OFFSET is what is used as a "pointer" in C. In the "large" model, a pointer consists of a 32 bit structure which contains two 16 bit values, the SEGMENT and the OFFSET. The SEGMENT and the OFFSET are combined to make a 20 bit address like this: SEGMENT [0001|0000|0100|0000] 0x1040 OFFSET [0011|0011|0011|0011] 0x3333 -------------------------- ADDRESS [0001|0011|0111|0011|0011] 1040:3333 or 1000:3733 or 1001:3633 or Note: a pointer may have many 1002:3533 or values and still point to the same thing! ... or 1373:0003 To convert the pointer 0040:3333 to an unsigned long address we use the formula (SEGMENT * 16) + OFFSET to get: (0x1040 * 16) + 0x3333 = 0x00013733 Note: even though a pointer may have many values, it has only ONE address! On the 808x chips this is a physical ADDRESS, but NOT a valid POINTER. Note that in this discussion, pointers are not addresses and addresses are not pointers! Two addresses may be subtracted to obtain a valid number which is the absolute difference (in bytes) of their physical locations. An address may be converted into a normalized pointer by constructing a SEGMENT:OFFSET pair where the lower 12 bits of the SEGMENT are ZERO. segment = (unsigned short)(address & 0x000F0000) / 16; offset = (unsigned short)(address & 0x0000FFFF); Only pointers which A) are normalized, or B) have the same SEGMENT value can be validly compared for equality. All addresses can be validly compared for equality. Intel bashing flames should go to /dev/null, glaring errors should be emailed. minor errors should be ignored. -- John Plocher uwvax!geowhiz!uwspan!plocher plocher%uwspan.UUCP@uwvax.CS.WISC.EDU
dave@murphy.UUCP (Dave Cornutt) (08/31/87)
In article <7939@think.UUCP>, barmar@think.COM (Barry Margolin) writes: > In article <572@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >In article <483@mtxinu.UUCP>, ed@mtxinu.UUCP (Ed Gould) writes: > > > >> It's also not legal in the proposed ANSI C standard. Pointers > >> may be subtracted *only* if they point to members of the same > >> array of elements. > > > > This is not an unnecessary restriction. It is there because the > construct is non-portable, and the purpose of the C standard (indeed, > ANY language standard) is to define a language in which portable > programs may be written. I'll agree that the construct posted, (long) p1 - (long) p2, is nonportable and should be flagged as such. However, I will say this, because I don't think the standard has really addressed it: there should be a way to take any pointer and generate a byte offset from byte 0 in whatever address space the code is running in. The reason is that you need such a beast to feed to lseek if you want to access something through one of the /dev/mem devices (or maybe /proc). > In fact, I know of an architecture where it may return different > results for pointers to the same two objects at different times: the > Symbolics Lisp Machine. It has a garbage collector that moves objects > around in memory, so the addresses may change, and therefore the > difference may change. I must be missing something here. Admittedly, I don't know anything about this machine, but it looks like this garbage collection would make pointers useless, since there is no guarantee that, when you dereference a pointer, the object that you're referring to will be in the same place that it was when you obtained the address. I can see how it could be done using some sort of highly segmented memory, but it seems like the overhead would be enormous (i.e., the iAPX 432). How does this work? --- "I dare you to play this record" -- Ebn-Ozn Dave Cornutt, Gould Computer Systems, Ft. Lauderdale, FL [Ignore header, mail to these addresses] UUCP: ...!{sun,pur-ee,brl-bmd,seismo,bcopen,rb-dc1}!gould!dcornutt or ...!{ucf-cs,allegra,codas,hcx1}!novavax!gould!dcornutt ARPA: dcornutt@gswd-vms.arpa "The opinions expressed herein are not necessarily those of my employer, not necessarily mine, and probably not necessary."
guy@gorodish.UUCP (08/31/87)
> However, I will say this, because I don't think the standard has really > addressed it: there should be a way to take any pointer and generate a byte > offset from byte 0 in whatever address space the code is running in. The > reason is that you need such a beast to feed to lseek if you want to access > something through one of the /dev/mem devices (or maybe /proc). The standard should NOT address this. The standard mentions neither "lseek" nor "/dev/mem" nor "/proc". This sort of thing is rather non-portable, and is as such completely outside the scope of the standard. Since getting at some other address space must be done in a different fashion on different implementations, it is perfectly OK to require that getting the location in that other address space also be done in a different fashion on different implementations. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/01/87)
In article <588@murphy.UUCP> dave@murphy.UUCP (Dave Cornutt) writes: >... However, I will say this, because I don't >think the standard has really addressed it: there should be a way to take >any pointer and generate a byte offset from byte 0 in whatever address >space the code is running in. The reason is that you need such a beast >to feed to lseek if you want to access something through one of the >/dev/mem devices (or maybe /proc). The ANSI C standard doesn't address this (pun intended?) because the process may have incommensurable multiple data address spaces. It cannot dictate the mapping to be used by UNIX /dev/*mem and similar facilities; that's not within the scope of the C standard, which also has to apply to non-UNIX-like environments. It is up to the operating system implementation to make things like that work; it has nothing to do with the C language.
throopw@xyzzy.UUCP (09/01/87)
> dave@murphy.UUCP (Dave Cornutt) > However, I will say this, because I don't > think the standard has really addressed it: there should be a way to take > any pointer and generate a byte offset from byte 0 in whatever address > space the code is running in. The reason is that you need such a beast > to feed to lseek if you want to access something through one of the > /dev/mem devices (or maybe /proc). By "the standard", I presume draft X3J11 is meant. First, the C language standard had better say nothing that requires there to even *BE* a single, linear address space in which "code is running". There are many machines where this isn't a well-founded presumption. Thus the whole idea of a process-unique "byte 0", or "a byte offset" from there may not be present in the hardware for which the C source is being compiled. To say nothing of whether a C language standard should be talking about "lseek" and "/dev/mem" in its rationale for a general feature. In fact, "the standard" says just about what it ought to say. It gives liscence to developers for whom it is natural to supply "byte offsets from byte 0" to supply them, but does not require it from those developers for whom it is an impossibility. -- 1+1=3, for sufficently large values of 1. -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
barmar@think.UUCP (09/02/87)
In article <588@murphy.UUCP> dave@murphy.UUCP (Dave Cornutt) writes: >> In fact, I know of an architecture where it may return different >> results for pointers to the same two objects at different times: the >> Symbolics Lisp Machine. It has a garbage collector that moves objects >> around in memory, so the addresses may change, and therefore the >> difference may change. > >I must be missing something here. Admittedly, I don't know anything about >this machine, but it looks like this garbage collection would make pointers >useless, since there is no guarantee that, when you dereference a pointer, >the object that you're referring to will be in the same place that it was >when you obtained the address. I can see how it could be done using some >sort of highly segmented memory, but it seems like the overhead would be >enormous (i.e., the iAPX 432). How does this work? Whenever the garbage collector moves something, it effectively updates all pointers to the object. Most Lisp garbage collectors are of this relocating variety these days, as it also tends to shrink the working set and increase locality. A particular pointer variable will always point to the same object (until it is reassigned, of course), although its internal numerical value may change. I'm not sure how they deal with the fact that a pointer cast into an integer and back into a pointer (or is it vice versa?) must maintain its value. My guess is that they maintain a hash table of pointers that have been converted into integers. As for the overhead, it's just part of the garbage collection that Lisp programmers have been living with for decades. It's worth it not to have to keep track of when memory needs to be deallocated. And Lisp Machines have special hardware that optimizes GC. --- Barry Margolin Thinking Machines Corp. barmar@think.com seismo!think!barmar
bc@halley.UUCP (Bill Crews) (09/02/87)
In article <6357@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >In article <2130@umn-cs.UUCP>, randy@umn-cs.UUCP (Randy Orrison) writes: >> In article <483@mtxinu.UUCP> ed@mtxinu.UUCP (Ed Gould) writes: >> >Pointers may be subtracted *only* if they point to members of the same >> >array of elements. >> How is this determined? example: >> strlen(s) >> return (c-s); > >Obviously all characters in a string are in the same object (be it >(char []) or chunk of malloc()-allocated storage. It seems to me that everyone is ignoring his Ed's point. Let's say a function is to take two pointer arguments, a pointer to a string and a pointer into the string. What you say seems to indicate that arithmetic expressions involving both pointers, such as their difference, will produce unpredictable results at execution time, because the called function has no way of knowing whether the pointers are actually to the same "string" or not. -bc -- Bill Crews Tandem Computers Austin, Texas ..!seismo!ut-sally!im4u!esc-bb!halley!bc (512) 244-8350
daveb@geac.UUCP (Brown) (09/02/87)
In article <26910@sun.uucp> guy@gorodish.UUCP writes: >> ... there should be a way to take any pointer and generate a byte >> offset from byte 0 in whatever address space the code is running in. The >> reason is that you need such a beast to feed to lseek if you want to access >> something through one of the /dev/mem devices (or maybe /proc). > >The standard should NOT address this. The standard mentions neither "lseek" >nor "/dev/mem" nor "/proc". This sort of thing is rather non-portable, and is >as such completely outside the scope of the standard. I agree that the standard should not address machine-specific issues (and especially /dev/mem), but the implementors of particular compilers for the language need to address the question. (this is more of an arch. than a c discussion, however). The Adavolutians have chosen to relegate the discussion of what optional features a particular compiler has implemented to a specific appendix: the standard writers might well define such an appendix for the C language. It can then address such issues where the poor client might be able to find it. --dave (I once did QA on a compiler: never again) c-b -- David Collier-Brown. {mnetor|yetti|utgpu}!geac!daveb Geac Computers International Inc., | Computer Science loses its 350 Steelcase Road,Markham, Ontario, | memory (if not its mind) CANADA, L3R 1B3 (416) 475-0525 x3279 | every 6 months.
peter@sugar.UUCP (Peter da Silva) (09/02/87)
> The standard should NOT address this. The standard mentions neither "lseek"
Are you saying that the ANSI 'C' library includes all the UNIX date/time
functions, but doesn't include lseek?
Ack, oop.
--
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter
-- U <--- not a copyrighted cartoon :->
dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/03/87)
In article <625@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: >Are you saying that the ANSI 'C' library includes all the UNIX date/time >functions, but doesn't include lseek? One distinguishing difference between operating systems designed with interactive use in mind (e.g. AmigaDOS, MS-DOS, UNIX) and operating systems that trace their ancestry to the days of punched cards (e.g. VAX/VMS, most IBM mainframe operating systems, and perhaps Primos) is the inability of the latter to do an arbitrary lseek. I speculate that the punched-card paradigm was most effectively implemented on disk by storing the card image as [<length> <data>] thus allowing cards of any length (not just 80 characters) to be stored, and easily skipped in a sequential read without having to read each character. Counterexamples probably exist. -- Rahul Dhesi UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi
guy%gorodish@Sun.COM (Guy Harris) (09/03/87)
> Are you saying that the ANSI 'C' library includes all the UNIX date/time > functions, but doesn't include lseek? That is precisely what I am saying, because it is true. I find the presence of the date/time functions in the C standard somewhat questionable, as that sort of date/time conversion is usually an OS function - both the internal format used to represent dates and/or times, and the printable format generally used, are OS-dependent. "lseek" isn't in the standard, but then neither are "open", "close", "read", nor "write". This is as it should be; a portable program can't expect any more from those routines than from their standard I/O equivalents. Consider a system that supports records in files that being with a byte count, and use FORTRAN carriage control at the beginning of the record in text files - in such a system, "read" and "write" would have to perform the same sort of translation on data in order to make UNIX programs work without change, "lseek" would have to work with cookies rather than byte offsets, and if you wanted to be able to use "read" or "write" to get at the "raw" binary data in the file, you'd have to have a text/binary flag on "open", or something such as that. If you want a standard that ensures UNIX-flavored behavior, use POSIX, not ANSI C. Both types of standard have their roles, but they are different roles. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/04/87)
In article <625@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: >Are you saying that the ANSI 'C' library includes all the UNIX date/time >functions, but doesn't include lseek? It doesn't include open(), read(), write(), fork(), etc. either. The reason is that it is probably impossible to specify these adequately in a common specification for all systems. Since the stdio routines ARE specified, there is little need for the lower-level I/O routines in portable application programming. The date/time functions are specified in a system-independent way and are useful in portable applications. The fact that they originated in the UNIX C library is largely irrelevant; most of the library routines in the proposed ANSI C standard did. lseek(), read(), etc. are specified in IEEE 1003.1 (POSIX), however, since it specifically addresses just UNIX-like systems.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/04/87)
In article <27183@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
-I find the presence of
-the date/time functions in the C standard somewhat questionable, as that sort
-of date/time conversion is usually an OS function - both the internal format
-used to represent dates and/or times, and the printable format generally used,
-are OS-dependent.
But the proposed ANSI C standard guarantees enough about these functions
to make them useful for portable programs. That seems like a win.
jpn@teddy.UUCP (John P. Nelson) (09/04/87)
In article <625@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: >> The standard should NOT address this. The standard mentions neither "lseek" > >Are you saying that the ANSI 'C' library includes all the UNIX date/time >functions, but doesn't include lseek? The Draft Standard does not include any of the UNIX low-level io functions (read/write/open/close) including lseek. Fseek IS supported. The rationale says something to the effect that the low level functions are 1. redundant, 2. not necessarily any more efficient than the FILE based functions. They do mention the POSIX standard, and that those functions will be defined there.
meissner@xyzzy.UUCP (Michael Meissner) (09/04/87)
> > The standard should NOT address this. The standard mentions neither "lseek" > > Are you saying that the ANSI 'C' library includes all the UNIX date/time > functions, but doesn't include lseek? Yes. The functions open/read/write/lseek/close/ioctl/dup/dup2, etc. are all in the province of POSIX. Ansi C only deals with the stanard I/O functions for I/ -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner Arpa/Csnet: meissner@dg-rtp.DG.COM
guy%gorodish@Sun.COM (Guy Harris) (09/04/87)
> Let's say a function is to take two pointer arguments, a pointer to a > string and a pointer into the string. What you say seems to indicate > that arithmetic expressions involving both pointers, such as their > difference, will produce unpredictable results at execution time, because > the called function has no way of knowing whether the pointers are actually > to the same "string" or not. It indicates no such thing. The called function doesn't *have to* know whether the pointers point to elements of the same array; it is free to subtract them *as if* they were, since if they are a correct result will be produced if the program is compiled by a conforming compiler. As such, if the called function is called correctly, so that the two pointers *do* point to members of the same array, there will be no problem. If they do not point to members of the same array, the generated code that subtracts them can produce the "expected" result, produce a garbage result, or trigger global thermonuclear war; it is not obliged to worry about this. The Standard "imposes no requirements" on the behavior of an implementation in a particular situation if the Standard indicates that behavior in that situation is "undefined". "Permissible behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program exution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message)." People seem to be having trouble with this point, so I'll give some concrete examples. In a system with a flat address space, and where pointer subtraction is done by treating the bit patterns in the pointers as integral quantities, subtracting them, and dividing the result by the size of the object type to which both pointers point, subtraction of two "char *"s that do not point to members of the same array will produce the "expected" result, namely the distance between the two addresses in "char"-sized units. In a system with a segmented address space, where pointer subtraction is done by subtracting the offsets of the pointers, an entire array must fit into a segment in order to produce a conforming implementation. (You may even have to ensure that the last element doesn't end on the last address of the segment, in order that you can also subtract a pointer from "pointer_to_last_element"+1.) If both pointers point to objects in the same array, the segment number is irrelevant; subtracting the offsets gives the correct result. If both pointers point to objects in different segments (which are obviously not members of the same array), you will get a meaningless result. This is not a problem for e.g. "strlen"; "strlen" *will* give the correct length if handed a real string (such that all characters, including the null character, are members of the same array, and thus in the same segment). What it does when handed something that isn't a real string is irrelevant. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
jpn@teddy.UUCP (John P. Nelson) (09/05/87)
[Lots of stuff about subtracting pointers deleted...] > >It seems to me that everyone is ignoring his Ed's point. Let's say a function >is to take two pointer arguments, a pointer to a string and a pointer >into the string. What you say seems to indicate that arithmetic expressions >involving both pointers, such as their difference, will produce unpredictable >results at execution time, because the called function has no way of knowing >whether the pointers are actually to the same "string" or not. This is the wrong way of looking at it. The compiler is free to ASSUME that the two pointers point to members of a single array: otherwise the program would not be a "strictly conforming program". The compiler does not HAVE TO decide if the two pointers can be subtracted: The standard says that it is the programmer's problem to assure this. The compiler is free to do just about anything if the program is incorrect. In other words, in a segmented architecture (like the 8086), the compiler can ASSUME that the segments are identical, and perform the computation on the two pointer offsets, because if the segments are different, the program is not correct. In a tagged architecture (or an interpreted environment), the fact that the two pointers to not point to members of an array might be detected at RUN TIME. The standard says that this is a perfectly valid approach. If you have a linear address space, where all addresses fit into an integer of some kind, the standard does not forbid returning the distance between two arbitrary pointers. You simply cannot assume that this will work for all implementations.
mc68020@gilsys.UUCP (Thomas J Keller) (09/05/87)
In article <6397@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > In article <625@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: > >Are you saying that the ANSI 'C' library includes all the UNIX date/time > >functions, but doesn't include lseek? > It doesn't include open(), read(), write(), fork(), etc. either. > The reason is that it is probably impossible to specify these > adequately in a common specification for all systems. Since the > stdio routines ARE specified, there is little need for the > lower-level I/O routines in portable application programming. So in other words, Mr. Gwyn, what you are saying is that the ANSI C workgroup has taken it upon themselves to decide that "portable applications" programs have NO NEED to do other than straight sequential I/O on files, is this correct? How very paternalistic of them! Sounds to me as if some (most?) of the people on that group are making some pretty heavy assumptions, some of which may well BREAK the usefulness of the ANSI C standard sufficiently as to render it totally USELESS. -- Tom Keller VOICE : + 1 707 575 9493 UUCP : {ihnp4,ames,sun,amdahl,lll-crg,pyramid}!ptsfa!gilsys!mc68020
allbery@ncoast.UUCP (09/05/87)
As quoted from <625@sugar.UUCP> by peter@sugar.UUCP (Peter da Silva): +--------------- | > The standard should NOT address this. The standard mentions neither "lseek" | | Are you saying that the ANSI 'C' library includes all the UNIX date/time | functions, but doesn't include lseek? +--------------- We're not talking about UNIX standards, we're talking about C standards. Date and time are easily convertible under any OS; but how do you implement a byte-oriented lseek() under VMS? VM/CMS on an IBM? (Both use fixed 80-byte records for text files -- NOT byte streams! -- and other record formats for non-text files.) -- Brandon S. Allbery, moderator of comp.sources.misc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>> All opinions in this message are random characters produced when my cat jumped (-: up onto the keyboard of my PC. :-)
peter@sugar.UUCP (Peter da Silva) (09/06/87)
> a system, "read" and "write" would have to perform the same sort of translation > on data in order to make UNIX programs work without change, "lseek" would have > to work with cookies rather than byte offsets, and if you wanted to be able to > use "read" or "write" to get at the "raw" binary data in the file, you'd have > to have a text/binary flag on "open", or something such as that. I have no problem with any of that. On systems where it is not appropriate to use read() and write() as the primitives, implement them using fread() and fwrite(). There is certainly a precedent for having flags on open(), too. -- -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter -- 'U` <-- Public domain wolf.
rsalz@bbn.com (Richard Salz) (09/09/87)
In article <6397@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) explains that ANSI doesn't specify open, read, write. To which, in comp.unix.wizards (<1122@gilsys.UUCP>), mc68020@gilsys.UUCP (Thomas J Keller) writes: > So in other words, Mr. Gwyn, what you are saying is that the ANSI C >workgroup has taken it upon themselves to decide that "portable applications" >programs have NO NEED to do other than straight sequential I/O on files, >is this correct? How very paternalistic of them! In general, Doug's postings have to be read the same way you read K&R or the vintage Unix manuals (i.e., then the programmers wrote them, not a separate techdoc department): pay attention to every word, and give as much note to what is not said, as to what is said. It is not appropriate for ANSI to specify the "Unix system-call" level, it is appropriate for them to document the "standard I/O level." Hence, X3J11 does specify fseek. Please read, and ponder, more carefully before you make snide, insulting comments -- especially to or about people as useful to the net as Doug. /r$ -- For comp.sources.unix stuff, mail to sources@uunet.uu.net.
guy%gorodish@Sun.COM (Guy Harris) (09/09/87)
> I have no problem with any of that. On systems where it is not appropriate > to use read() and write() as the primitives, implement them using fread() > and fwrite(). What would this buy you, other than a false sense of security when moving UNIX programs to non-UNIX systems? Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
allbery@ncoast.UUCP (Brandon Allbery) (09/09/87)
As quoted from <286@halley.UUCP> by bc@halley.UUCP (Bill Crews): +--------------- | It seems to me that everyone is ignoring his Ed's point. Let's say a function | is to take two pointer arguments, a pointer to a string and a pointer | into the string. What you say seems to indicate that arithmetic expressions | involving both pointers, such as their difference, will produce unpredictable | results at execution time, because the called function has no way of knowing | whether the pointers are actually to the same "string" or not. +--------------- The point is that, while the subtraction might be doable, on a given architecture subtracting two pointers not into the same array might not have a meaning. On such hardware, the MMU knows what's what and the addresses reflect this. This is basically a declaration that the software isn't required to spin its wheels trying to deal with "unusual" hardware (can you truly call a PC "unusual? But large-model pointers are susceptible, since multiple <segment>:<offset> pairs may point to the same address. Pointer normalization is expensive). Note that in all of these cases, falling off the end of a string will still yield a valid pointer (for subtraction, at least) -- although some tagged architectures might hand you a segmentation violation when you try to go beyond the defined end of the string/data area. -- Brandon S. Allbery, moderator of comp.sources.misc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>> All opinions in this message are random characters produced when my cat jumped (-: up onto the keyboard of my PC. :-)
guy%gorodish@Sun.COM (Guy Harris) (09/09/87)
> So in other words, Mr. Gwyn, what you are saying is that the ANSI C > workgroup has taken it upon themselves to decide that "portable applications" > programs have NO NEED to do other than straight sequential I/O on files, > is this correct? "fseek" is in the standard. What is this nonsense about "straight sequential I/O?" Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
henry@utzoo.UUCP (Henry Spencer) (09/09/87)
> So in other words, Mr. Gwyn, what you are saying is that the ANSI C > workgroup has taken it upon themselves to decide that "portable applications" > programs have NO NEED to do other than straight sequential I/O on files, > is this correct? ... Nonsense. Stdio includes fread, fwrite, and fseek. The X3J11 drafts do put some restrictions on portable uses of them, which are inevitable given that the full generality of something like Unix seeks is unimplementable on some systems. The question is not whether portable applications have real needs to do strange things, but whether these strange things can be done in a *portable* way that will work on *most* machines. Often they can, *if* one is willing to work at the level of stdio and observe some extra restrictions. It is neither appropriate nor practical for X3J11 to wave a magic wand and declare that any system which can't implement full Unix semantics is broken. There really are things which simply CANNOT BE DONE in a portable way, and people writing portable programs or designing tools for writing portable programs must acknowledge this. -- "There's a lot more to do in space | Henry Spencer @ U of Toronto Zoology than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry
jc@minya.UUCP (09/12/87)
Hey, what for are all youse guys talkin' about fseek, lseek, and even read and write, when da Subject line says: Re: pointer alignment when int != char * Ennyhow, dis here group is supposta be about C. C doesn't have I/O. It just has functions calls; it don't from nowhere about I/O. [Oops, pardon the Chicagoese; please move the discussion to some place where it is relevant, like maybe comp.os.unix or something similar. As for any claim that some C standards discuss irrelevancies like I/O, well, there *is* a newsgroup for discussing C standards.] -- John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
mouse@mcgill-vision.UUCP (09/22/87)
In article <8024@think.UUCP>, barmar@think.COM (Barry Margolin) writes: > In article <588@murphy.UUCP> dave@murphy.UUCP (Dave Cornutt) writes: [>>> is someone else; Dave is >>] >>> [Lisp Machines have a garbage collector which moves objects, >>> resulting in pointers that change behind your back] >> [but this means a pointer can be invalidated on you] > [when GC moves something, it updates all pointers.] > A particular pointer variable will always point to the same object > (until it is reassigned, of course), although its internal numerical > value may change. > I'm not sure how they deal with the fact that a pointer cast into an > integer and back into a pointer (or is it vice versa?) must maintain > its value. My guess is that they maintain a hash table of pointers > that have been converted into integers. Lisp doesn't have that sort of cast (well, it usually does, but only as a documented-to-be-dangerous subprimitive). Does Symbolics or LMI or anyone provide a C compiler for a Lisp Machine? If they do, I would guess that either they do as you suggest, maintaining some table of objects which must not be moved, or they have hooks into the garbage collector permitting the C run-time to lock objects down, or they simply make the whole C environment one lisp object, which must be relocated as a whole if it is relocated at all. (Then of course all pointers will be relative to the beginning of this area.) der Mouse (mouse@mcgill-vision.uucp)