gwyn@smoke.BRL.MIL (Doug Gwyn) (01/10/90)
In article <1250.25ab3338@csc.anu.oz> bdm659@csc.anu.oz writes: >The following is a list of Doubtful Assumptions (DAs). ... >I'd welcome proofs in either case. Well, I'll try to respond, but with explanations, not rationalistic "proofs". I really have to take issue with people who insist on dissociating the purpose of the C Standard from reality, instead arguing excessively over formalism. We expressed the Standard in technical English rather than a formal notation primarily in order to aid programmers (and to a lesser degree, implementors) to relate it to their daily activity. It is not intended to form a system suitable for treating with formal symbolic logic and therefore should not be taken as such. Thus, a truly perverse implementation might actually comply with the letter of the Standard while exploiting unintended loopholes to produce a travesty quite at variance with the spirit of C. (We tried to document in the Rationale most of the intentional loopholes.) Another meta-comment here is that the DA examples indicate too much concern with representational aspects of entities within a C program and too little concern with dealing with data at the appropriate level of abstraction. In the vast majority of applications, these questions should not even arise. The answers I give will assume that implementations do not go out of their way to introduce unnecessary complications. (Necessary ones, caused by architectural or environmental considerations, are okay; we deliberately allowed slack in the specifications to cover those.) >DA[0]: int *pi; char *pc; > Suppose pi is valid, and do pc = (char*) pi. Then *pc overlaps *pi > in the sense that changing the value of *pc changes the value of *pi. TA (True Assumption). The addresses of the bytes within a single object constitute a nice linear address space. (However, there need not be one global linear address space within which all objects are located.) It is not specified which PART of *pi is accessed by *pc, but some part must be. Big-endian and little-endian architectures will differ here. >DA[1]: int *pi, *pj; char *pc, *pd; > Suppose pi and pj are valid, and that pi == pj . > Now do pc = (char*) pi; pd = (char*) pj . > Then pc == pd . > [I bet this one generates some heat. Don't forget to justify > your disproof with references to pANS.] TA. Pointers to distinct objects (including bytes within other objects) compare unequal and vice-versa. The only loophole an implementation could exploit here would be to randomly select a byte address within the int object when the conversion to char* occurs, knowing that alignment constraints applied during the inverse conversion would recover the same int*. Even if such a loophole is logically permitted by the specification, I don't think it poses a serious practical threat, because I see no legitimate reason for introducing such run-time indeterminacy and therefore don't expect to see it in practical implementations. (The GNU project might do it just to show how "clever" they are; that seems to be their style, judging by their original treatment of #pragma. Frankly, such childish antics merely reinforce the negative opinion many already have of "pointy-headed Ivy League intellectuals", who play obstructive semantic games while the rest of us are trying to do productive work.) >DA[2]: Just like DA[1], but using type void* instead of char*. TA. A void* is really just a byte* (i.e., a char*) subject to additional programmer-safety compile-time constraints. The run-time representation of void* and char* MUST be identical (3.1.2.5), and this implies that success for one equality comparison implies success for the other. >DA[3]: long *pi, *pj; > Suppose that pi is valid, and do pj = (long*)(int*) pi; > Then pi == pj . > [comment: there's no rule that says an int can't have a more > strict alignment requirement that a long.] TA. If the conversion to int* does not violate the alignment constraint, then the test for equality must succeed. I don't know of any architectures where it would be reasonable for the C implementation to impose stricter alignment constraints on int than on long, so this is in practice a TA. Artifical implementations could be devised that make this a DA; see my meta-comments at the beginning of this article. >DA[4]: int i, *pi; > Suppose pi is a null pointer, and do i = (int) pi . > Then i == 0 . FA (False Assumption), even assuming that int is the appropriate implementation-defined integral type to satisfy 3.3.4. The most obvious implementation is to simply copy the pointer-representation bit pattern unchanged into the integral datum, as indicated by the footnote. Definitely, a null pointer need not be represented as all zero bits. >DA[5]: Just like DA[4], but with i of type unsigned long . Same comments here, assuming that unsigned-long is the appropriate type. For integral types that are TOO LONG, it is a semantic violation and MUST BE DIAGNOSED. I suppose that it's within both the letter and the spirit of the Standard for an implementation definition of the "size of integer required" to be "any size greater than or equal <n>" and a suitable statement about the integral representation for all qualifying sizes; that would avoid the need for such a diagnostic. (NOTE: I would regard any conformance test that looked for the too-long diagnostic as being silly, for the same general reasons that I gave for considering excessive linguistic analysis of the Standard as being silly. It was certainly not intended that such specifications get in the way of either programmers or implementors, and so long as there is a sensible way around having to take the specs so literally when that has undesired effects, we should all agree to do so -- in this case, as indicated via a benign interpretation of what the implementation definition can be.) >DA[6]: int *pi, *pj; > Suppose pi is valid, and do pj = (int*)(unsigned long) pi. > Then pi == pj . FA, assuming again that unsigned-long is the appropriate type. (By the preceding discussion, unsigned-long should certainly in practice always be acceptable for 3.3.4 purposes.) There can be reasonable implementations such that no integral type can hold all the information needed to represent a pointer. That is why the Standard does not require that the mapping between pointers and integers be invertible. >DA[7]: int i, *pi; > Suppose i != 0, and do pi = (int*) i . > Then pi != (int*)0 . FA. (int*)0 is a null pointer of type (int*), whereas pi is the implementation-defined result of converting the integer value 0 to an int*. 0 in this source code context may be treated as a special case by the compiler. >DA[8]: int *pi, *pj; > Suppose pi is a valid pointer of kind P3, and do > pj = (int*)(char*) pi . Then pi == pj . > [comment: the rule in section 3.3.4 only applies to pointers > to objects, which pi might not be.] [P3 means "one past the end".] I think 3.3.4 meant for "type" to distribute over "object or incomplete", as it does explicitly later in the same sentence. The intent is to distinguish these from function pointers. Even if that interpretation is not upheld by X3J11, it would be most unlikely that an implementation would cause this example not to succeed, because it would take more work not to. Thus, this is also TA. >DA[9]: int *pi; > Suppose that an external function f() is declared without prototype. > It expects a single argument of type void*. Assume that pi is valid. > Then the call f(pi) works. > [comment: See my remarks on section 3.1.2.5 below.] FA. int* and void* need not have the same representation, and generally would not for a word-addressed architecture. f((char*)pi) would work. >DA[10]: void *pv; external void *f(); > In fact, f returns a value of type int*. > Then pv = f() works. > [comment: See my remarks on section 3.1.2.5 below.] FA, for the same reason as preceding. External interfaces must have matching input and output data representations, for obvious reasons. >References and some nit-picking. [references omitted, since legitimately the whole Standard must be taken as an integrated specification (which I once imprecisely labeled a "gestalt"), not as a set of unrelated axioms from which formal deductions are to be made] >3.1.2.5. types and type terminology > definitions of "object type" and "incomplete type" > nit-pick: This section several rules of the form "types X and Y have the > same representation and alignment requirements". Footnote 15 > tells us that this is intended to imply interchangeability as > function arguments, function return values, and members of > unions. However, this does not follow from the rule. > Interchangeability of two types as function arguments requires, > in addition, equality of argument-passing mechanisms. This is > nowhere prescribed. I don't know what you mean by this; the footnote is EXPLAINING what we intended by these terms. Don't you think that function arguments have to be somehow represented and aligned? (Note, by the way, that there are often different alignment requirements for function arguments than for other uses of the same data type; e.g., char arguments on the PDP-11.) >3.3.4. more on conversion amongst pointer types > conversions between integral types and pointers > nit-pick: The case of (obj*)0 should be excluded from these rules > as it is specified differently in 3.2.2.3. 3.2.2.3 says that a null pointer constant, which may be expressed as 0 (among other alternatives), converted to a pointer constitutes a null pointer. 3.3.4 says that an arbitrary integer may be converted to a pointer. Thus i=0,pi=(int*)i; does not necessarily result in pi containing a null pointer representation, and that is intentional. 3.3.4 explains that conversions involving pointers (except ..., not relevant here) shall be specified by an explicit cast, and it spells out their implementation-defined and undefined aspects. Note that the construct (int*)0 is not covered under 3.3.4 since it does not involve the conversion of an integer to a pointer -- 3.2.2.3 has already given that construct a different interpretation. That leaves lots of constructs for which the Standard assigns no other meaning to be encompassed by 3.3.4, for example (int*)1. I really don't see that there is any practical problem in understanding null pointers expressed like (int*)0 in C source code, once one realizes that in this context 0 is a null pointer constant, not an integer. There has been continual confusion about this in comp.lang.c (INFO-C), but it has nothing to do with the Standard; rather it inheres in the overloading of the token 0 in source code to have multiple meanings. This was not an issue for the architectures for which C was initially implemented, but the necessity of treating such expressions specially became more evident as C spread to unusual architectures. (The theoretically proper way to have dealt with this would of course have been for the language to have provided a reserved symbol such as "nil". Keep that in mind when you design the D programming language.) >3.3.8. relational operators > nit-pick: The phrase "or both are null pointers" is missing from the > sentence in lines 8-10. See the otherwise identical sentence > in section 3.3.9. No, this omission was deliberate, since it is improper to provide a null pointer as an operand of a relational operator, which is what 3.3.8 is all about. 3.3.9 covers the equality operators, for which null pointers are permissible operands. In summary: As I've said in the past and elaborated somewhat upon at the beginning of this article, one cannot understand what C is by applying formalistic arguments to the phraseology in the Standard. I doubt that the Standard in itself suffices to completely specify what is essential about C to someone who has never encountered it (or, even more extreme, who knows nothing about computer programming); THAT IS NOT ITS PURPOSE. It is merely intended to serve as a reference "treaty" by which both C programmers and C implementors agree to be bound, in order to facilitate the use of C as a practical tool in solving real-world problems, with particular emphasis on source-level application portability. Therefore, you should refer to the Standard to see what the terms of the treaty are, not to determine what is sane or insane. An unduly warped implementation does not facilitate the use of C; there is much more involved in determining the utility of an implementation than merely literal conformance to the letter of the Standard. (X3J11 termed these "quality of implementation" issues.) An implementor who provides a perverse implementation would undoubtedly incur the wrath of his customers, and deservedly so.
bdm659@csc.anu.oz (01/10/90)
This is an article about the semantics of pointers as defined by the ANSI standard for C (henceforth pANS). It is not necessarily about any existing C compiler, nor about C as defined by any source other than pANS. You probably can't contribute to it significantly unless you have a copy of pANS. If you think that any of my claims are wrong (perfectly possible) then the only way to demonstrate that is via precise reference to the text of pANS. Any quotes from pANS refer to the version of Dec. 7, 1988. Note that some relevant wording changed a lot in the last few revisions. I will call a pointer "valid" if it can be made by a strictly conforming program. There appear to be three kinds of valid pointers: P1: pointers to objects P2: null pointers P3: pointers to "just past" an object (especially array objects, but any pointer to an object can be regarded as a pointer to an array of size one, then incremented once). The following is a list of Doubtful Assumptions (DAs). The definition of a DA is "an assumption that a C programmer might be tempted to make, but which I cannot prove to be justified according to pANS". Determining whether something follows from pANS is not always a simple matter, so some of my DAs might turn out to be not doubtful at all. In fact, I hope I'm wrong about some of them. I'd welcome proofs in either case. DA[0]: int *pi; char *pc; Suppose pi is valid, and do pc = (char*) pi. Then *pc overlaps *pi in the sense that changing the value of *pc changes the value of *pi. DA[1]: int *pi, *pj; char *pc, *pd; Suppose pi and pj are valid, and that pi == pj . Now do pc = (char*) pi; pd = (char*) pj . Then pc == pd . [I bet this one generates some heat. Don't forget to justify your disproof with references to pANS.] DA[2]: Just like DA[1], but using type void* instead of char*. DA[3]: long *pi, *pj; Suppose that pi is valid, and do pj = (long*)(int*) pi; Then pi == pj . [comment: there's no rule that says an int can't have a more strict alignment requirement that a long.] DA[4]: int i, *pi; Suppose pi is a null pointer, and do i = (int) pi . Then i == 0 . DA[5]: Just like DA[4], but with i of type unsigned long . DA[6]: int *pi, *pj; Suppose pi is valid, and do pj = (int*)(unsigned long) pi. Then pi == pj . DA[7]: int i, *pi; Suppose i != 0, and do pi = (int*) i . Then pi != (int*)0 . DA[8]: int *pi, *pj; Suppose pi is a valid pointer of kind P3, and do pj = (int*)(char*) pi . Then pi == pj . [comment: the rule in section 3.3.4 only applies to pointers to objects, which pi might not be.] DA[9]: int *pi; Suppose that an external function f() is declared without prototype. It expects a single argument of type void*. Assume that pi is valid. Then the call f(pi) works. [comment: See my remarks on section 3.1.2.5 below.] DA[10]: void *pv; external void *f(); In fact, f returns a value of type int*. Then pv = f() works. [comment: See my remarks on section 3.1.2.5 below.] References and some nit-picking. 1.6. definitions of "object" and "alignment" 3.1.2.5. types and type terminology definitions of "object type" and "incomplete type" nit-pick: This section several rules of the form "types X and Y have the same representation and alignment requirements". Footnote 15 tells us that this is intended to imply interchangeability as function arguments, function return values, and members of unions. However, this does not follow from the rule. Interchangeability of two types as function arguments requires, in addition, equality of argument-passing mechanisms. This is nowhere prescribed. 3.2.2.3. conversions amongst pointer types, null pointers 3.3.2.1. array subscripting 3.3.3.2. * and & operators 3.3.4. more on conversion amongst pointer types conversions between integral types and pointers nit-pick: The case of (obj*)0 should be excluded from these rules as it is specified differently in 3.2.2.3. 3.3.6. pointer + integer, pointers just past an object 3.3.8. relational operators nit-pick: The phrase "or both are null pointers" is missing from the sentence in lines 8-10. See the otherwise identical sentence in section 3.3.9. 3.3.9. equality operators 3.3.16.1. simple assignment 3.5.2.1. conversions between pointers to union members =========================================================== Brendan McKay. bdm@anucsd.oz.au or bdm659@csc1.anu.oz.au terrorist: n. an individual who behaves like a government (original)
bill@twwells.com (T. William Wells) (01/11/90)
Don't ask me why, but 3.2.2.3 does say integral constant *expression*. That permits 1 - 1 to be a null pointer constant. And (char *)(1 - 1) to be a null pointer. Wierd. Also, I imagine, it won't help dispel the confusion that seems to surround null pointers. --- Bill { uunet | novavax | ankh } !twwells!bill bill@twwells.com
bdm659@csc.anu.oz (01/13/90)
In article <11922@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes: > In article <1250.25ab3338@csc.anu.oz> bdm659@csc.anu.oz writes: > >The following is a list of Doubtful Assumptions (DAs). ... > >I'd welcome proofs in either case. > > Well, I'll try to respond, but with explanations, not rationalistic > "proofs". I really have to take issue with people who insist on > dissociating the purpose of the C Standard from reality, instead > arguing excessively over formalism. The question of whether or not a particular coding practice is strictly conforming is of great importance to anyone seriously interested in portable programming. FA = false assumption DA = doubtful assumption RA = reasonable assumption TA = true assumption > We expressed the Standard in > technical English rather than a formal notation primarily in order > to aid programmers (and to a lesser degree, implementors) to relate > it to their daily activity. It is not intended to form a system > suitable for treating with formal symbolic logic and therefore > should not be taken as such. Don't put words into my mouth. > Thus, a truly perverse implementation > might actually comply with the letter of the Standard while > exploiting unintended loopholes to produce a travesty quite at > variance with the spirit of C. See my comment about "perverse" at the end. > (We tried to document in the > Rationale most of the intentional loopholes.) I checked the Rationale concerning all my DAs. What's a "loophole", anyway? You seem to have taken the line that any DA which *you* feel should be a TA represents such an unintentional loophole. Justification? > Another meta-comment here is that the DA examples indicate too > much concern with representational aspects of entities within a > C program and too little concern with dealing with data at the > appropriate level of abstraction. In the vast majority of > applications, these questions should not even arise. So asking questions about strict conformity is sinful? > The answers I give will assume that implementations do not go out of > their way to introduce unnecessary complications. (Necessary ones, > caused by architectural or environmental considerations, are okay; > we deliberately allowed slack in the specifications to cover those.) One motive for my posting was to explore the boundary between "unintentional loopholes", as you call them, and deliberate non-specification. This is a perfectly reasonable object of study, and one which is appropriate to this newsgroup. Your criticisms of everyone who attempts it are not helpful. > >DA[0]: int *pi; char *pc; > > Suppose pi is valid, and do pc = (char*) pi. Then *pc overlaps *pi > > in the sense that changing the value of *pc changes the value of *pi. > > TA (True Assumption). The addresses of the bytes within a single object > constitute a nice linear address space. (However, there need not be one > global linear address space within which all objects are located.) > > It is not specified which PART of *pi is accessed by *pc, but some part > must be. Big-endian and little-endian architectures will differ here. Forgive me if my memory is wrong, but I seem to remember a posting of yours in which you agreed that the members of unions might not physically overlap in some implementations. If you consider that pi might point to such a member, there are difficulties in reconciling that posting with this one. > >DA[1]: int *pi, *pj; char *pc, *pd; > > Suppose pi and pj are valid, and that pi == pj . > > Now do pc = (char*) pi; pd = (char*) pj . > > Then pc == pd . > > TA. Pointers to distinct objects (including bytes within other objects) > compare unequal and vice-versa. The only loophole an implementation > could exploit here would be to randomly select a byte address within the > int object when the conversion to char* occurs, knowing that alignment > constraints applied during the inverse conversion would recover the same > int*. Even if such a loophole is logically permitted by the specification, > I don't think it poses a serious practical threat, because I see no > legitimate reason for introducing such run-time indeterminacy and > therefore don't expect to see it in practical implementations. ... Yes, an argument on the basis of determinacy might be reasonable here. It's a pity such meta-arguments are needed, though. A fundamental difficulty in analysing these problems is that pANS doesn't ever define the semantics of pointer conversion. We are only given some functional axioms, from which some desirable properties, like this one, don't obviously follow. > >DA[2]: Just like DA[1], but using type void* instead of char*. > > TA. A void* is really just a byte* (i.e., a char*) subject to additional > programmer-safety compile-time constraints. The run-time representation > of void* and char* MUST be identical (3.1.2.5), and this implies that > success for one equality comparison implies success for the other. Why does equal representation imply equal semantics? An argument based on the existence of functions like memcpy() might show DA[2] is an RA, though. > >DA[3]: long *pi, *pj; > > Suppose that pi is valid, and do pj = (long*)(int*) pi; > > Then pi == pj . > > [comment: there's no rule that says an int can't have a more > > strict alignment requirement that a long.] > > TA. If the conversion to int* does not violate the alignment constraint, > then the test for equality must succeed. I don't know of any architectures > where it would be reasonable for the C implementation to impose stricter > alignment constraints on int than on long, so this is in practice a TA. An implementation in which short arithmetic is in hardware and long arithmetic is in software might reasonably have this property. I think this is a FA. > >DA[4]-DA[6]: Your analyses agree with mine on these. All are FAs. > >DA[7]: int i, *pi; > > Suppose i != 0, and do pi = (int*) i . > > Then pi != (int*)0 . > > FA. (int*)0 is a null pointer of type (int*), whereas pi is the > implementation-defined result of converting the integer value 0 to an > int*. 0 in this source code context may be treated as a special case > by the compiler. Actually i is nonzero, though your argument still holds. However, the case with i==0 is a more instructive DA (indeed FA as you say). > >DA[8]: int *pi, *pj; > > Suppose pi is a valid pointer of kind P3, and do > > pj = (int*)(char*) pi . Then pi == pj . > > [comment: the rule in section 3.3.4 only applies to pointers > > to objects, which pi might not be.] > > [P3 means "one past the end".] I think 3.3.4 meant for "type" to > distribute over "object or incomplete", as it does explicitly later in > the same sentence. The intent is to distinguish these from function > pointers. Even if that interpretation is not upheld by X3J11, it would > be most unlikely that an implementation would cause this example not to > succeed, because it would take more work not to. Thus, this is also TA. We seem to be looking at different sentences. I meant the sentence "It is guaranteed ... original pointer." starting on the last line of page 46. The word "type" doesn't appear in it at all. Anyway, alignment is defined as a requirement on objects; applying it to the values of pointers which may not be pointers to objects seems doubtful. As far as implementations are concerned, consider one which treats pointers of kind P3 as a special case in representation (perhaps to permit objects reaching to the end of memory segments). It may be that making (int*)(char*) a no-op in this case could take more work, not less. I think this is a DA, probably a FA, though this could well be unintentional. > >DA[9]-DA[10]: I agree these are FAs. > >References and some nit-picking. > > >3.1.2.5. types and type terminology > > definitions of "object type" and "incomplete type" > > nit-pick: This section several rules of the form "types X and Y have the > > same representation and alignment requirements". Footnote 15 > > tells us that this is intended to imply interchangeability as > > function arguments, function return values, and members of > > unions. However, this does not follow from the rule. > > Interchangeability of two types as function arguments requires, > > in addition, equality of argument-passing mechanisms. This is > > nowhere prescribed. > > I don't know what you mean by this; the footnote is EXPLAINING what we > intended by these terms. Don't you think that function arguments have > to be somehow represented and aligned? You missed my point. Suppose the body of the text said "Gismos are pink." and the footnote said "This is meant to imply that gismos are pink and crinkly.". What can we infer about gismos? Well, the only reasonable inference is that gismos are both pink and crinkly, even though the footnote is not supposed to be part of the standard. However, we could also reasonably grumble about the extra information not appearing in the proper place. If you doubt the analogy, consider the mythical XYZ compiler: values of type void* and char* are both represented as 32-bit unsigned addresses, and have no alignment requirements. However, since void* was added later by a different programmer, arguments of type void* are passed in registers whereas those of type char* are passed on the stack. Implementations quite often use different argument passing mechanisms for different types, so I don't think this is particularly perverse. It shows that representation+alignment equality does not imply argument interchangeability, i.e., the footnote adds an entirely new restriction. That is all I was nit-picking about. > >3.3.4. more on conversion amongst pointer types > > conversions between integral types and pointers > > nit-pick: The case of (obj*)0 should be excluded from these rules > > as it is specified differently in 3.2.2.3. I agree with your response to this. > >3.3.8. relational operators > > nit-pick: The phrase "or both are null pointers" is missing from the > > sentence in lines 8-10. See the otherwise identical sentence > > in section 3.3.9. > > No, this omission was deliberate, since it is improper to provide a null > pointer as an operand of a relational operator, which is what 3.3.8 is > all about. 3.3.9 covers the equality operators, for which null pointers > are permissible operands. I will concede defeat on this too, but with some reluctance. The sentence in question is not describing the behaviour of a relational operator. Unless, perhaps, you interpret "equality" as the conjunction of "<=" and ">=". > In summary: > > As I've said in the past and elaborated somewhat upon at the beginning > of this article, one cannot understand what C is by applying formalistic > arguments to the phraseology in the Standard. I doubt that the Standard > in itself suffices to completely specify what is essential about C to > someone who has never encountered it (or, even more extreme, who knows > nothing about computer programming); THAT IS NOT ITS PURPOSE. It is from the Forward: "[pANs] addresses the problems of both the program developer and the translator implementor by specifying the C language precisely." Are you saying it fails to meet this claim? > merely intended to serve as a reference "treaty" by which both C > programmers and C implementors agree to be bound, in order to facilitate > the use of C as a practical tool in solving real-world problems, with > particular emphasis on source-level application portability. > > Therefore, you should refer to the Standard to see what the terms of the > treaty are, not to determine what is sane or insane. I wish to test things for strict conformity by applying the definition that the standard gives. If it isn't "specified in this Standard", it isn't strictly conforming. > An unduly warped > implementation does not facilitate the use of C; there is much more > involved in determining the utility of an implementation than merely > literal conformance to the letter of the Standard. (X3J11 termed these > "quality of implementation" issues.) An implementor who provides a > perverse implementation would undoubtedly incur the wrath of his > customers, and deservedly so. (1) I was addressing conformance, not utility. (2) One person's "perverse" is another person's "reasonable". (Your example of GNU's #pragma will do.) A study of what things are strictly conforming *because the standard actually says that they are* is a worthwhile pursuit because it establishes a solid foundation on which to base further discussion. Your exhortations against that study are unjustified. Brendan McKay bdm@anucsd.oz or bdm659@csc1.anu.oz
gwyn@smoke.BRL.MIL (Doug Gwyn) (01/16/90)
In article <1259.25ae2019@csc.anu.oz> bdm659@csc.anu.oz writes: >Don't put words into my mouth. I didn't necessarily mean to imply that this was YOUR motivation, but many similar discussions in the past have been based on such a formalist/rationalist point of view (to be charitable to them). >Forgive me if my memory is wrong, but I seem to remember a posting of yours >in which you agreed that the members of unions might not physically overlap >in some implementations. If you consider that pi might point to such a member, >there are difficulties in reconciling that posting with this one. Union members "overlap". In order to implement some forms of union on some unusual architectures, it may be necessary to use storage areas for different members that don't actually overlap. Nevertheless, it still conceptually is an overlap, and any program that assumes that one member value is unaffected upon storing into another member is not strictly conforming. This consideration doesn't affect the specific example of an int being accessed via a converted char pointer. An implementation may have some bits in its representation of an int that are "unused", and therefore modifying a part of the int via a char pointer might happen to not affect the represented value, depending on just where into the int representation the char* points. Assuming either that the value will be changed or that it won't be changed by such an operation also makes a program not strictly conforming. >... We are only given some functional axioms, from which >some desirable properties, like this one, don't obviously follow. Well, you know what I said about excessive rationalism. This was one example where an implementation could technical strictly conform without having the desirable property. As I said, I don't expect this to be a practical problem. The category "RA" would fit this and some others that I labeled "TA" when I meant "expected to be TA in practice, although not strictly logically deducible from the Standard's specifications". >Why does equal representation imply equal semantics? It doesn't, but if you think about how the implementation must use the representations in testing for pointer equality, it should be apparent that when two char*s compare equal, the corresponding void* conversions should compare equal. Again, a truly perverse implementation could possibly go out of its way to cause this to fail, but since it is harder to do that than to do it the obvious way, I also don't expect this to be a problem in practice. "RA" >An argument based on the existence of functions like memcpy() might show >DA[2] is an RA, though. Certainly memcpy() indicates what X3J11 thought void*s were supposed to be useful for. >Actually i is nonzero, though your argument still holds. However, the case >with i==0 is a more instructive DA (indeed FA as you say). Oops. >Anyway, alignment is defined as a requirement on objects; applying it >to the values of pointers which may not be pointers to objects seems >doubtful. As far as implementations are concerned, consider one which >treats pointers of kind P3 as a special case in representation (perhaps >to permit objects reaching to the end of memory segments). It may be >that making (int*)(char*) a no-op in this case could take more work, >not less. It would require a general slowdown in pointer operations to represent the P3 case in a manner not uniform with other object pointers. In any case, I would expect such an implementation to apply the same sort of local flat address space linearization that it must do for pointer arithmetic to the conversion of pointers. Since the size of an array includes padding, the "last+1" element of an array meets the same alignment constraints as the other array elements. "RA" >You missed my point. Suppose the body of the text said "Gismos are pink." >and the footnote said "This is meant to imply that gismos are pink and >crinkly.". What can we infer about gismos? Well, the only reasonable >inference is that gismos are both pink and crinkly, even though the footnote >is not supposed to be part of the standard. To me, this supports what I said about the problems introduced when one resorts to formalistic reasoning. You CANNOT understand a concept by merely substituting its definition for all linguistic uses of the concept. A concept in general subsumes more than the definition would imply; there are other relevant properties. When the Standard says that two types have the same representation and alignment requirements, it means that IN ALL CONTEXTS they have the same r&a requirements. That does imply what footnote 15 says; the footnote was added as a result of public review comments because we kept getting asked "Does this also mean when used as function arguments?". Note that there may be different contextual r&a requirements for the same type used as: static automatic register volatile string literal function argument function parameter function return value and in still other contexts. A context-free constraint on r&a requirements must be applied appropriately in all contexts. >from the Forward: "[pANs] addresses the problems of both the >program developer and the translator implementor by specifying the >C language precisely." Are you saying it fails to meet this claim? Precision is a matter of degree. The ANSI Standard is considerably more precise than K&R Appendix A. It need only be sufficiently precise to fulfil its role as reference treaty between programmer and implementor. >I wish to test things for strict conformity by applying the definition that >the standard gives. If it isn't "specified in this Standard", it isn't >strictly conforming. Conformance tests should not flag behavior as a violation if it is not possible to relate it to explicit wording in the Standard, even if it is insane behavior, if the result of such testing might end up in litigation. These "RA" cases deserve to be uncovered, however, since many applications are and will be assuming reasonable implementations and the programmer should hear about it if an implementation goes out of its way to introduce unwarranted obstacles. >(2) One person's "perverse" is another person's "reasonable". My notion of a perverse implementation is one that deliberately exploits an ambiguity in the specification, with considerably more effort than would be required to implement "reasonable" behavior such as would be expected by experienced C programmers, to force portable programs to use complex code to avoid triggering the perverse behavior. If someone else thinks such an implementation should be deemed reasonable, he's disconnected from the real world of computer programming. This does not include reasonable implementation choices that are forced by the architecture or environment, just those for which there is no good reason. As an example of how an implementation can be perverse, suppose that a POSIX implementation really literally obeyed the last word of the specification in IEEE Std 1003.1-1988 section 8.2.3.4. While that would be literal conformance to the specification, and while a reasonable implementation would have to violate that part of the specification, it would nevertheless be perverse to obey that spec, which is obviously an error due to incomplete editing of text copied from scetion 8.2.3.2.
ps@fps.com (Patricia Shanahan) (01/17/90)
In article <11954@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <1259.25ae2019@csc.anu.oz> bdm659@csc.anu.oz writes: >My notion of a perverse implementation is one that deliberately >exploits an ambiguity in the specification, with considerably more >effort than would be required to implement "reasonable" behavior >such as would be expected by experienced C programmers, to force >portable programs to use complex code to avoid triggering the >perverse behavior. If someone else thinks such an implementation >should be deemed reasonable, he's disconnected from the real world >of computer programming. > >This does not include reasonable implementation choices that are >forced by the architecture or environment, just those for which >there is no good reason. > The first significant C programming job I ever did was a re-targeting of the portable C compiler to an unusual architecture. Although I was an experienced programmer, had read K&R (which was the only standard at the time) and had written a couple of small C programs, I was not an "experienced C programmer". I really needed a clear standard that did not depend on the concept of "reasonable" behavior. If you are going to depend on unwritten background data to interpret the standard, you don't really need the standard. A good standard should permit a skilled programmer to write a correct implementation without having access to an unwritten code of "reasonable" behavior. I solved the problem by treating the actual behavior of pcc as a de facto standard. Unfortunately, this gave me some problems in the area of mixed type assignment-ops where the actual behavior of pcc and the statements in K&R conflicted. Although I had no intention of being perverse, there were situations in which K&R appeared to permit behavior that in practice was dangerous. For example, the natural way of representing a pointer to function was with a half-word (rather than byte) address. All the jump commands that would be used in calls required half-word addresses. I set up a perfectly consistent system, in which conversion between a pointer to function and (for example) an integer involved one bit shifts. It suited the architecture well and conformed to every rule about pointer types and conversions that I could find in K&R. Then I found out some of the details of the signal interface. This is really a plea for making sure that the standard actually says EVERYTHING that "experienced C programmers" are going to be justified in expecting of a C implementation, and does not depend on the implementor already knowing what is expected. I do not think you can assume someone is disconnected from the real world of computer programming just because they cannot read the collective minds of all experienced C programmers to find out what is really required in a C implementation, especially since I have seen it amply demonstrated that experienced C programmers often disagree about what is reasonable. -- Patricia Shanahan ps@fps.com uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps phone: (619) 271-9940
bdm659@csc.anu.oz (01/17/90)
In article <6203@celit.fps.com>, ps@fps.com (Patricia Shanahan) writes: > [...] > I really needed a clear standard that did not depend on the concept of > "reasonable" behavior. If you are going to depend on unwritten background data > to interpret the standard, you don't really need the standard. A good > standard should permit a skilled programmer to write a correct implementation > without having access to an unwritten code of "reasonable" behavior. > > [...] > This is really a plea for making sure that the standard actually says > EVERYTHING that "experienced C programmers" are going to be justified in > expecting of a C implementation, and does not depend on the implementor already > knowing what is expected. I do not think you can assume someone is disconnected > from the real world of computer programming just because they cannot read > the collective minds of all experienced C programmers to find out what is > really required in a C implementation, especially since I have seen it amply > demonstrated that experienced C programmers often disagree about what is > reasonable. I would also note that I wish to be able to write C programs which will be portable well into the future (at least until the next C standard, or until we blow up, irradiate, poison, asphyxiate, freeze, or cook ourselves --- whichever comes first). Being able to read the minds of all existing C programmers, and being an expert on all existing hardware, is insufficient to be sure of what the boundary between "reasonable assumption" and "doubtful assumption" will be in, say, 1997. That is one reason why it is worth knowing what the True Assumptions are, and programming within them. Fortunately, this isn't all that difficult in the great majority of cases. Brendan McKay. bdm@anucsd.oz.au or bdm659@csc1.anu.oz.au