[comp.std.c] doubtful assumptions about pointers

gwyn@smoke.BRL.MIL (Doug Gwyn) (01/10/90)

In article <1250.25ab3338@csc.anu.oz> bdm659@csc.anu.oz writes:
>The following is a list of Doubtful Assumptions (DAs).  ...
>I'd welcome proofs in either case.

Well, I'll try to respond, but with explanations, not rationalistic
"proofs".  I really have to take issue with people who insist on
dissociating the purpose of the C Standard from reality, instead
arguing excessively over formalism.  We expressed the Standard in
technical English rather than a formal notation primarily in order
to aid programmers (and to a lesser degree, implementors) to relate
it to their daily activity.  It is not intended to form a system
suitable for treating with formal symbolic logic and therefore
should not be taken as such.  Thus, a truly perverse implementation
might actually comply with the letter of the Standard while
exploiting unintended loopholes to produce a travesty quite at
variance with the spirit of C.  (We tried to document in the
Rationale most of the intentional loopholes.)

Another meta-comment here is that the DA examples indicate too
much concern with representational aspects of entities within a
C program and too little concern with dealing with data at the
appropriate level of abstraction.  In the vast majority of
applications, these questions should not even arise.

The answers I give will assume that implementations do not go out of
their way to introduce unnecessary complications.  (Necessary ones,
caused by architectural or environmental considerations, are okay;
we deliberately allowed slack in the specifications to cover those.)

>DA[0]:  int *pi; char *pc;
>        Suppose pi is valid, and do  pc = (char*) pi.  Then *pc overlaps *pi
>        in the sense that changing the value of *pc changes the value of *pi.

TA (True Assumption).  The addresses of the bytes within a single object
constitute a nice linear address space.  (However, there need not be one
global linear address space within which all objects are located.)

It is not specified which PART of *pi is accessed by *pc, but some part
must be.  Big-endian and little-endian architectures will differ here.

>DA[1]:  int *pi, *pj;  char *pc, *pd;
>        Suppose pi and pj are valid,  and that  pi == pj .
>        Now do  pc = (char*) pi; pd = (char*) pj .
>        Then  pc == pd .
>        [I bet this one generates some heat.  Don't forget to justify
>         your disproof with references to pANS.]

TA.  Pointers to distinct objects (including bytes within other objects)
compare unequal and vice-versa.  The only loophole an implementation
could exploit here would be to randomly select a byte address within the
int object when the conversion to char* occurs, knowing that alignment
constraints applied during the inverse conversion would recover the same
int*.  Even if such a loophole is logically permitted by the specification,
I don't think it poses a serious practical threat, because I see no
legitimate reason for introducing such run-time indeterminacy and
therefore don't expect to see it in practical implementations.  (The GNU
project might do it just to show how "clever" they are; that seems to be
their style, judging by their original treatment of #pragma.  Frankly,
such childish antics merely reinforce the negative opinion many already
have of "pointy-headed Ivy League intellectuals", who play obstructive
semantic games while the rest of us are trying to do productive work.)

>DA[2]:  Just like DA[1], but using type void* instead of char*.

TA.  A void* is really just a byte* (i.e., a char*) subject to additional
programmer-safety compile-time constraints.  The run-time representation
of void* and char* MUST be identical (3.1.2.5), and this implies that
success for one equality comparison implies success for the other.

>DA[3]:  long *pi, *pj;
>        Suppose that pi is valid, and do  pj = (long*)(int*) pi;
>        Then  pi == pj .
>        [comment: there's no rule that says an int can't have a more
>         strict alignment requirement that a long.]

TA.  If the conversion to int* does not violate the alignment constraint,
then the test for equality must succeed.  I don't know of any architectures
where it would be reasonable for the C implementation to impose stricter
alignment constraints on int than on long, so this is in practice a TA.
Artifical implementations could be devised that make this a DA; see my
meta-comments at the beginning of this article.

>DA[4]:  int i, *pi;
>        Suppose pi is a null pointer, and do  i = (int) pi .
>        Then  i == 0 .

FA (False Assumption), even assuming that int is the appropriate
implementation-defined integral type to satisfy 3.3.4.  The most obvious
implementation is to simply copy the pointer-representation bit pattern
unchanged into the integral datum, as indicated by the footnote.
Definitely, a null pointer need not be represented as all zero bits.

>DA[5]:  Just like DA[4], but with i of type  unsigned long .

Same comments here, assuming that unsigned-long is the appropriate type.
For integral types that are TOO LONG, it is a semantic violation and MUST
BE DIAGNOSED.  I suppose that it's within both the letter and the spirit
of the Standard for an implementation definition of the "size of integer
required" to be "any size greater than or equal <n>" and a suitable
statement about the integral representation for all qualifying sizes;
that would avoid the need for such a diagnostic.

(NOTE:  I would regard any conformance test that looked for the too-long
diagnostic as being silly, for the same general reasons that I gave for
considering excessive linguistic analysis of the Standard as being silly.
It was certainly not intended that such specifications get in the way of
either programmers or implementors, and so long as there is a sensible
way around having to take the specs so literally when that has undesired
effects, we should all agree to do so -- in this case, as indicated via
a benign interpretation of what the implementation definition can be.)

>DA[6]:  int *pi, *pj;
>        Suppose pi is valid, and do  pj = (int*)(unsigned long) pi.
>        Then  pi == pj .

FA, assuming again that unsigned-long is the appropriate type.  (By the
preceding discussion, unsigned-long should certainly in practice always
be acceptable for 3.3.4 purposes.)  There can be reasonable
implementations such that no integral type can hold all the information
needed to represent a pointer.  That is why the Standard does not
require that the mapping between pointers and integers be invertible.

>DA[7]:  int i, *pi;
>        Suppose i != 0, and do  pi = (int*) i .
>        Then  pi != (int*)0 .

FA.  (int*)0 is a null pointer of type (int*), whereas pi is the
implementation-defined result of converting the integer value 0 to an
int*.  0 in this source code context may be treated as a special case
by the compiler.

>DA[8]:  int *pi, *pj;
>        Suppose pi is a valid pointer of kind P3, and do
>        pj = (int*)(char*) pi .   Then  pi == pj .
>        [comment: the rule in section 3.3.4 only applies to pointers
>         to objects, which pi might not be.]

[P3 means "one past the end".]  I think 3.3.4 meant for "type" to
distribute over "object or incomplete", as it does explicitly later in
the same sentence.  The intent is to distinguish these from function
pointers.  Even if that interpretation is not upheld by X3J11, it would
be most unlikely that an implementation would cause this example not to
succeed, because it would take more work not to.  Thus, this is also TA.

>DA[9]:  int *pi;
>        Suppose that an external function f() is declared without prototype.
>        It expects a single argument of type void*.   Assume that pi is valid.
>        Then the call  f(pi)  works.
>        [comment:  See my remarks on section 3.1.2.5 below.]

FA.  int* and void* need not have the same representation, and generally
would not for a word-addressed architecture.  f((char*)pi) would work.

>DA[10]: void *pv;  external void *f();
>        In fact, f returns a value of type int*.
>        Then   pv = f()  works.
>        [comment:  See my remarks on section 3.1.2.5 below.]

FA, for the same reason as preceding.  External interfaces must have
matching input and output data representations, for obvious reasons.

>References and some nit-picking.

[references omitted, since legitimately the whole Standard must be taken
as an integrated specification (which I once imprecisely labeled a
"gestalt"), not as a set of unrelated axioms from which formal deductions
are to be made]

>3.1.2.5.  types and type terminology
>          definitions of "object type" and "incomplete type"
>   nit-pick:  This section several rules of the form "types X and Y have the
>              same representation and alignment requirements".  Footnote 15
>              tells us that this is intended to imply interchangeability as
>              function arguments, function return values, and members of
>              unions.  However, this does not follow from the rule.
>              Interchangeability of two types as function arguments requires,
>              in addition, equality of argument-passing mechanisms.  This is
>              nowhere prescribed.

I don't know what you mean by this; the footnote is EXPLAINING what we
intended by these terms.  Don't you think that function arguments have
to be somehow represented and aligned?  (Note, by the way, that there are
often different alignment requirements for function arguments than for
other uses of the same data type; e.g., char arguments on the PDP-11.)

>3.3.4.    more on conversion amongst pointer types
>          conversions between integral types and pointers
>   nit-pick:  The case of (obj*)0 should be excluded from these rules
>              as it is specified differently in 3.2.2.3.

3.2.2.3 says that a null pointer constant, which may be expressed as 0
(among other alternatives), converted to a pointer constitutes a null
pointer.  3.3.4 says that an arbitrary integer may be converted to a
pointer.  Thus i=0,pi=(int*)i; does not necessarily result in pi
containing a null pointer representation, and that is intentional.

3.3.4 explains that conversions involving pointers (except ..., not
relevant here) shall be specified by an explicit cast, and it spells
out their implementation-defined and undefined aspects.  Note that
the construct (int*)0 is not covered under 3.3.4 since it does not
involve the conversion of an integer to a pointer -- 3.2.2.3 has
already given that construct a different interpretation.  That leaves
lots of constructs for which the Standard assigns no other meaning to
be encompassed by 3.3.4, for example (int*)1.

I really don't see that there is any practical problem in understanding
null pointers expressed like (int*)0 in C source code, once one realizes
that in this context 0 is a null pointer constant, not an integer.  There
has been continual confusion about this in comp.lang.c (INFO-C), but it
has nothing to do with the Standard; rather it inheres in the overloading
of the token 0 in source code to have multiple meanings.  This was not an
issue for the architectures for which C was initially implemented, but
the necessity of treating such expressions specially became more evident
as C spread to unusual architectures.  (The theoretically proper way to
have dealt with this would of course have been for the language to have
provided a reserved symbol such as "nil".  Keep that in mind when you
design the D programming language.)

>3.3.8.    relational operators
>   nit-pick:  The phrase "or both are null pointers" is missing from the
>              sentence in lines 8-10.  See the otherwise identical sentence
>              in section 3.3.9.

No, this omission was deliberate, since it is improper to provide a null
pointer as an operand of a relational operator, which is what 3.3.8 is
all about.  3.3.9 covers the equality operators, for which null pointers
are permissible operands.

In summary:

As I've said in the past and elaborated somewhat upon at the beginning
of this article, one cannot understand what C is by applying formalistic
arguments to the phraseology in the Standard.  I doubt that the Standard
in itself suffices to completely specify what is essential about C to
someone who has never encountered it (or, even more extreme, who knows
nothing about computer programming); THAT IS NOT ITS PURPOSE.  It is
merely intended to serve as a reference "treaty" by which both C
programmers and C implementors agree to be bound, in order to facilitate
the use of C as a practical tool in solving real-world problems, with
particular emphasis on source-level application portability.

Therefore, you should refer to the Standard to see what the terms of the
treaty are, not to determine what is sane or insane.  An unduly warped
implementation does not facilitate the use of C; there is much more
involved in determining the utility of an implementation than merely
literal conformance to the letter of the Standard.  (X3J11 termed these
"quality of implementation" issues.)  An implementor who provides a
perverse implementation would undoubtedly incur the wrath of his
customers, and deservedly so.

bdm659@csc.anu.oz (01/10/90)

This is an article about the semantics of pointers as defined by the
ANSI standard for C (henceforth pANS).  It is not necessarily about any
existing C compiler, nor about C as defined by any source other than pANS.
You probably can't contribute to it significantly unless you have a copy of
pANS. If you think that any of my claims are wrong (perfectly possible) then
the only way to demonstrate that is via precise reference to the text of pANS.

Any quotes from pANS refer to the version of Dec. 7, 1988.  Note that some
relevant wording changed a lot in the last few revisions.

I will call a pointer "valid" if it can be made by a strictly conforming
program.  There appear to be three kinds of valid pointers:
P1: pointers to objects
P2: null pointers
P3: pointers to "just past" an object (especially array objects, but any
    pointer to an object can be regarded as a pointer to an array of size
    one, then incremented once).

The following is a list of Doubtful Assumptions (DAs).  The definition of
a DA is "an assumption that a C programmer might be tempted to make, but
which I cannot prove to be justified according to pANS".  Determining
whether something follows from pANS is not always a simple matter, so some
of my DAs might turn out to be not doubtful at all.  In fact, I hope I'm
wrong about some of them.  I'd welcome proofs in either case.

DA[0]:  int *pi; char *pc;
        Suppose pi is valid, and do  pc = (char*) pi.  Then *pc overlaps *pi
        in the sense that changing the value of *pc changes the value of *pi.

DA[1]:  int *pi, *pj;  char *pc, *pd;
        Suppose pi and pj are valid,  and that  pi == pj .
        Now do  pc = (char*) pi; pd = (char*) pj .
        Then  pc == pd .
        [I bet this one generates some heat.  Don't forget to justify
         your disproof with references to pANS.]

DA[2]:  Just like DA[1], but using type void* instead of char*.

DA[3]:  long *pi, *pj;
        Suppose that pi is valid, and do  pj = (long*)(int*) pi;
        Then  pi == pj .
        [comment: there's no rule that says an int can't have a more
         strict alignment requirement that a long.]

DA[4]:  int i, *pi;
        Suppose pi is a null pointer, and do  i = (int) pi .
        Then  i == 0 .

DA[5]:  Just like DA[4], but with i of type  unsigned long .

DA[6]:  int *pi, *pj;
        Suppose pi is valid, and do  pj = (int*)(unsigned long) pi.
        Then  pi == pj .

DA[7]:  int i, *pi;
        Suppose i != 0, and do  pi = (int*) i .
        Then  pi != (int*)0 .

DA[8]:  int *pi, *pj;
        Suppose pi is a valid pointer of kind P3, and do
        pj = (int*)(char*) pi .   Then  pi == pj .
        [comment: the rule in section 3.3.4 only applies to pointers
         to objects, which pi might not be.]

DA[9]:  int *pi;
        Suppose that an external function f() is declared without prototype.
        It expects a single argument of type void*.   Assume that pi is valid.
        Then the call  f(pi)  works.
        [comment:  See my remarks on section 3.1.2.5 below.]

DA[10]: void *pv;  external void *f();
        In fact, f returns a value of type int*.
        Then   pv = f()  works.
        [comment:  See my remarks on section 3.1.2.5 below.]

References and some nit-picking.

1.6.      definitions of "object" and "alignment"
3.1.2.5.  types and type terminology
          definitions of "object type" and "incomplete type"
   nit-pick:  This section several rules of the form "types X and Y have the
              same representation and alignment requirements".  Footnote 15
              tells us that this is intended to imply interchangeability as
              function arguments, function return values, and members of
              unions.  However, this does not follow from the rule.
              Interchangeability of two types as function arguments requires,
              in addition, equality of argument-passing mechanisms.  This is
              nowhere prescribed.
3.2.2.3.  conversions amongst pointer types, null pointers
3.3.2.1.  array subscripting
3.3.3.2.  * and & operators
3.3.4.    more on conversion amongst pointer types
          conversions between integral types and pointers
   nit-pick:  The case of (obj*)0 should be excluded from these rules
              as it is specified differently in 3.2.2.3.
3.3.6.    pointer + integer, pointers just past an object
3.3.8.    relational operators
   nit-pick:  The phrase "or both are null pointers" is missing from the
              sentence in lines 8-10.  See the otherwise identical sentence
              in section 3.3.9.
3.3.9.    equality operators
3.3.16.1. simple assignment
3.5.2.1.  conversions between pointers to union members

===========================================================
Brendan McKay.  bdm@anucsd.oz.au  or  bdm659@csc1.anu.oz.au
terrorist: n. an individual who behaves like a government  (original)

bill@twwells.com (T. William Wells) (01/11/90)

Don't ask me why, but 3.2.2.3 does say integral constant
*expression*.

That permits 1 - 1 to be a null pointer constant. And
(char *)(1 - 1) to be a null pointer.

Wierd.

Also, I imagine, it won't help dispel the confusion that seems to
surround null pointers.

---
Bill                    { uunet | novavax | ankh } !twwells!bill
bill@twwells.com

bdm659@csc.anu.oz (01/13/90)

In article <11922@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
> In article <1250.25ab3338@csc.anu.oz> bdm659@csc.anu.oz writes:
> >The following is a list of Doubtful Assumptions (DAs).  ...
> >I'd welcome proofs in either case.
>
> Well, I'll try to respond, but with explanations, not rationalistic
> "proofs".  I really have to take issue with people who insist on
> dissociating the purpose of the C Standard from reality, instead
> arguing excessively over formalism.

The question of whether or not a particular coding practice is strictly
conforming is of great importance to anyone seriously interested in
portable programming.

FA = false assumption
DA = doubtful assumption
RA = reasonable assumption
TA = true assumption

>                                      We expressed the Standard in
> technical English rather than a formal notation primarily in order
> to aid programmers (and to a lesser degree, implementors) to relate
> it to their daily activity.  It is not intended to form a system
> suitable for treating with formal symbolic logic and therefore
> should not be taken as such.

Don't put words into my mouth.

>                               Thus, a truly perverse implementation
> might actually comply with the letter of the Standard while
> exploiting unintended loopholes to produce a travesty quite at
> variance with the spirit of C.

See my comment about "perverse" at the end.

>                                 (We tried to document in the
> Rationale most of the intentional loopholes.)

I checked the Rationale concerning all my DAs.  What's a "loophole", anyway?
You seem to have taken the line that any DA which *you* feel should be a
TA represents such an unintentional loophole.  Justification?

> Another meta-comment here is that the DA examples indicate too
> much concern with representational aspects of entities within a
> C program and too little concern with dealing with data at the
> appropriate level of abstraction.  In the vast majority of
> applications, these questions should not even arise.

So asking questions about strict conformity is sinful?

> The answers I give will assume that implementations do not go out of
> their way to introduce unnecessary complications.  (Necessary ones,
> caused by architectural or environmental considerations, are okay;
> we deliberately allowed slack in the specifications to cover those.)

One motive for my posting was to explore the boundary between "unintentional
loopholes", as you call them, and deliberate non-specification. This is a
perfectly reasonable object of study, and one which is appropriate to this
newsgroup.  Your criticisms of everyone who attempts it are not helpful.

> >DA[0]:  int *pi; char *pc;
> >        Suppose pi is valid, and do  pc = (char*) pi.  Then *pc overlaps *pi
> >        in the sense that changing the value of *pc changes the value of *pi.
>
> TA (True Assumption).  The addresses of the bytes within a single object
> constitute a nice linear address space.  (However, there need not be one
> global linear address space within which all objects are located.)
>
> It is not specified which PART of *pi is accessed by *pc, but some part
> must be.  Big-endian and little-endian architectures will differ here.

Forgive me if my memory is wrong, but I seem to remember a posting of yours
in which you agreed that the members of unions might not physically overlap
in some implementations.  If you consider that pi might point to such a member,
there are difficulties in reconciling that posting with this one.

> >DA[1]:  int *pi, *pj;  char *pc, *pd;
> >        Suppose pi and pj are valid,  and that  pi == pj .
> >        Now do  pc = (char*) pi; pd = (char*) pj .
> >        Then  pc == pd .
>
> TA.  Pointers to distinct objects (including bytes within other objects)
> compare unequal and vice-versa.  The only loophole an implementation
> could exploit here would be to randomly select a byte address within the
> int object when the conversion to char* occurs, knowing that alignment
> constraints applied during the inverse conversion would recover the same
> int*.  Even if such a loophole is logically permitted by the specification,
> I don't think it poses a serious practical threat, because I see no
> legitimate reason for introducing such run-time indeterminacy and
> therefore don't expect to see it in practical implementations.  ...

Yes, an argument on the basis of determinacy might be reasonable here.
It's a pity such meta-arguments are needed, though.  A fundamental difficulty
in analysing these problems is that pANS doesn't ever define the semantics
of pointer conversion.  We are only given some functional axioms, from which
some desirable properties, like this one, don't obviously follow.

> >DA[2]:  Just like DA[1], but using type void* instead of char*.
>
> TA.  A void* is really just a byte* (i.e., a char*) subject to additional
> programmer-safety compile-time constraints.  The run-time representation
> of void* and char* MUST be identical (3.1.2.5), and this implies that
> success for one equality comparison implies success for the other.

Why does equal representation imply equal semantics?
An argument based on the existence of functions like memcpy() might show
DA[2] is an RA, though.

> >DA[3]:  long *pi, *pj;
> >        Suppose that pi is valid, and do  pj = (long*)(int*) pi;
> >        Then  pi == pj .
> >        [comment: there's no rule that says an int can't have a more
> >         strict alignment requirement that a long.]
>
> TA.  If the conversion to int* does not violate the alignment constraint,
> then the test for equality must succeed.  I don't know of any architectures
> where it would be reasonable for the C implementation to impose stricter
> alignment constraints on int than on long, so this is in practice a TA.

An implementation in which short arithmetic is in hardware and long arithmetic
is in software might reasonably have this property.  I think this is a FA.

> >DA[4]-DA[6]:

Your analyses agree with mine on these.  All are FAs.

> >DA[7]:  int i, *pi;
> >        Suppose i != 0, and do  pi = (int*) i .
> >        Then  pi != (int*)0 .
>
> FA.  (int*)0 is a null pointer of type (int*), whereas pi is the
> implementation-defined result of converting the integer value 0 to an
> int*.  0 in this source code context may be treated as a special case
> by the compiler.

Actually i is nonzero, though your argument still holds.  However, the case
with i==0 is a more instructive DA (indeed FA as you say).

> >DA[8]:  int *pi, *pj;
> >        Suppose pi is a valid pointer of kind P3, and do
> >        pj = (int*)(char*) pi .   Then  pi == pj .
> >        [comment: the rule in section 3.3.4 only applies to pointers
> >         to objects, which pi might not be.]
>
> [P3 means "one past the end".]  I think 3.3.4 meant for "type" to
> distribute over "object or incomplete", as it does explicitly later in
> the same sentence.  The intent is to distinguish these from function
> pointers.  Even if that interpretation is not upheld by X3J11, it would
> be most unlikely that an implementation would cause this example not to
> succeed, because it would take more work not to.  Thus, this is also TA.

We seem to be looking at different sentences.  I meant the sentence "It is
guaranteed ... original pointer." starting on the last line of page 46.
The word "type" doesn't appear in it at all.  Anyway, alignment is defined
as a requirement on objects; applying it to the values of pointers which may
not be pointers to objects seems doubtful.  As far as implementations are
concerned, consider one which treats pointers of kind P3 as a special case
in representation (perhaps to permit objects reaching to the end of memory
segments).  It may be that making (int*)(char*) a no-op in this case could
take more work, not less.  I think this is a DA, probably a FA, though this
could well be unintentional.

> >DA[9]-DA[10]:

I agree these are FAs.

> >References and some nit-picking.
>
> >3.1.2.5.  types and type terminology
> >          definitions of "object type" and "incomplete type"
> >   nit-pick:  This section several rules of the form "types X and Y have the
> >              same representation and alignment requirements".  Footnote 15
> >              tells us that this is intended to imply interchangeability as
> >              function arguments, function return values, and members of
> >              unions.  However, this does not follow from the rule.
> >              Interchangeability of two types as function arguments requires,
> >              in addition, equality of argument-passing mechanisms.  This is
> >              nowhere prescribed.
>
> I don't know what you mean by this; the footnote is EXPLAINING what we
> intended by these terms.  Don't you think that function arguments have
> to be somehow represented and aligned?

You missed my point.  Suppose the body of the text said "Gismos are pink."
and the footnote said "This is meant to imply that gismos are pink and
crinkly.".  What can we infer about gismos?  Well, the only reasonable
inference is that gismos are both pink and crinkly, even though the footnote
is not supposed to be part of the standard.  However, we could also reasonably
grumble about the extra information not appearing in the proper place.
If you doubt the analogy, consider the mythical XYZ compiler: values of type
void* and char* are both represented as 32-bit unsigned addresses, and have no
alignment requirements.  However, since void* was added later by a different
programmer, arguments of type void* are passed in registers whereas those of
type char* are passed on the stack.  Implementations quite often use
different argument passing mechanisms for different types, so I don't think
this is particularly perverse.  It shows that representation+alignment
equality does not imply argument interchangeability, i.e., the footnote adds
an entirely new restriction.  That is all I was nit-picking about.

> >3.3.4.    more on conversion amongst pointer types
> >          conversions between integral types and pointers
> >   nit-pick:  The case of (obj*)0 should be excluded from these rules
> >              as it is specified differently in 3.2.2.3.

I agree with your response to this.

> >3.3.8.    relational operators
> >   nit-pick:  The phrase "or both are null pointers" is missing from the
> >              sentence in lines 8-10.  See the otherwise identical sentence
> >              in section 3.3.9.
>
> No, this omission was deliberate, since it is improper to provide a null
> pointer as an operand of a relational operator, which is what 3.3.8 is
> all about.  3.3.9 covers the equality operators, for which null pointers
> are permissible operands.

I will concede defeat on this too, but with some reluctance.  The sentence in
question is not describing the behaviour of a relational operator.  Unless,
perhaps, you interpret "equality" as the conjunction of "<=" and ">=".

> In summary:
>
> As I've said in the past and elaborated somewhat upon at the beginning
> of this article, one cannot understand what C is by applying formalistic
> arguments to the phraseology in the Standard.  I doubt that the Standard
> in itself suffices to completely specify what is essential about C to
> someone who has never encountered it (or, even more extreme, who knows
> nothing about computer programming); THAT IS NOT ITS PURPOSE.  It is

from the Forward: "[pANs] addresses the problems of both the
program developer and the translator implementor by specifying the
C language precisely."   Are you saying it fails to meet this claim?

> merely intended to serve as a reference "treaty" by which both C
> programmers and C implementors agree to be bound, in order to facilitate
> the use of C as a practical tool in solving real-world problems, with
> particular emphasis on source-level application portability.
>
> Therefore, you should refer to the Standard to see what the terms of the
> treaty are, not to determine what is sane or insane.

I wish to test things for strict conformity by applying the definition that
the standard gives.  If it isn't "specified in this Standard", it isn't
strictly conforming.

>                                                       An unduly warped
> implementation does not facilitate the use of C; there is much more
> involved in determining the utility of an implementation than merely
> literal conformance to the letter of the Standard.  (X3J11 termed these
> "quality of implementation" issues.)  An implementor who provides a
> perverse implementation would undoubtedly incur the wrath of his
> customers, and deservedly so.

(1) I was addressing conformance, not utility.
(2) One person's "perverse" is another person's "reasonable".
(Your example of GNU's #pragma will do.)  A study of what things are strictly
conforming *because the standard actually says that they are* is a worthwhile
pursuit because it establishes a solid foundation on which to base further
discussion.  Your exhortations against that study are unjustified.

Brendan McKay
bdm@anucsd.oz  or  bdm659@csc1.anu.oz

gwyn@smoke.BRL.MIL (Doug Gwyn) (01/16/90)

In article <1259.25ae2019@csc.anu.oz> bdm659@csc.anu.oz writes:
>Don't put words into my mouth.

I didn't necessarily mean to imply that this was YOUR motivation,
but many similar discussions in the past have been based on such
a formalist/rationalist point of view (to be charitable to them).

>Forgive me if my memory is wrong, but I seem to remember a posting of yours
>in which you agreed that the members of unions might not physically overlap
>in some implementations.  If you consider that pi might point to such a member,
>there are difficulties in reconciling that posting with this one.

Union members "overlap".  In order to implement some forms of union
on some unusual architectures, it may be necessary to use storage areas
for different members that don't actually overlap.  Nevertheless, it
still conceptually is an overlap, and any program that assumes that
one member value is unaffected upon storing into another member is not
strictly conforming.

This consideration doesn't affect the specific example of an int being
accessed via a converted char pointer.

An implementation may have some bits in its representation of an int
that are "unused", and therefore modifying a part of the int via a
char pointer might happen to not affect the represented value, depending
on just where into the int representation the char* points.  Assuming
either that the value will be changed or that it won't be changed by such
an operation also makes a program not strictly conforming.

>...  We are only given some functional axioms, from which
>some desirable properties, like this one, don't obviously follow.

Well, you know what I said about excessive rationalism.
This was one example where an implementation could technical strictly
conform without having the desirable property.  As I said, I don't
expect this to be a practical problem.

The category "RA" would fit this and some others that I labeled "TA"
when I meant "expected to be TA in practice, although not strictly
logically deducible from the Standard's specifications".

>Why does equal representation imply equal semantics?

It doesn't, but if you think about how the implementation must use
the representations in testing for pointer equality, it should be
apparent that when two char*s compare equal, the corresponding
void* conversions should compare equal.  Again, a truly perverse
implementation could possibly go out of its way to cause this to
fail, but since it is harder to do that than to do it the obvious
way, I also don't expect this to be a problem in practice.  "RA"

>An argument based on the existence of functions like memcpy() might show
>DA[2] is an RA, though.

Certainly memcpy() indicates what X3J11 thought void*s were supposed
to be useful for.

>Actually i is nonzero, though your argument still holds.  However, the case
>with i==0 is a more instructive DA (indeed FA as you say).

Oops.

>Anyway, alignment is defined as a requirement on objects; applying it
>to the values of pointers which may not be pointers to objects seems
>doubtful.  As far as implementations are concerned, consider one which
>treats pointers of kind P3 as a special case in representation (perhaps
>to permit objects reaching to the end of memory segments).  It may be
>that making (int*)(char*) a no-op in this case could take more work,
>not less.

It would require a general slowdown in pointer operations to represent
the P3 case in a manner not uniform with other object pointers.  In any
case, I would expect such an implementation to apply the same sort of
local flat address space linearization that it must do for pointer
arithmetic to the conversion of pointers.  Since the size of an array
includes padding, the "last+1" element of an array meets the same
alignment constraints as the other array elements.  "RA"

>You missed my point.  Suppose the body of the text said "Gismos are pink."
>and the footnote said "This is meant to imply that gismos are pink and
>crinkly.".  What can we infer about gismos?  Well, the only reasonable
>inference is that gismos are both pink and crinkly, even though the footnote
>is not supposed to be part of the standard.

To me, this supports what I said about the problems introduced when
one resorts to formalistic reasoning.  You CANNOT understand a concept
by merely substituting its definition for all linguistic uses of the
concept.  A concept in general subsumes more than the definition would
imply; there are other relevant properties.

When the Standard says that two types have the same representation and
alignment requirements, it means that IN ALL CONTEXTS they have the
same r&a requirements.  That does imply what footnote 15 says; the
footnote was added as a result of public review comments because we
kept getting asked "Does this also mean when used as function arguments?".

Note that there may be different contextual r&a requirements for the
same type used as:
	static
	automatic
	register
	volatile
	string literal
	function argument
	function parameter
	function return value
and in still other contexts.  A context-free constraint on r&a
requirements must be applied appropriately in all contexts.

>from the Forward: "[pANs] addresses the problems of both the
>program developer and the translator implementor by specifying the
>C language precisely."   Are you saying it fails to meet this claim?

Precision is a matter of degree.  The ANSI Standard is considerably
more precise than K&R Appendix A.  It need only be sufficiently
precise to fulfil its role as reference treaty between programmer
and implementor.

>I wish to test things for strict conformity by applying the definition that
>the standard gives.  If it isn't "specified in this Standard", it isn't
>strictly conforming.

Conformance tests should not flag behavior as a violation if it is not
possible to relate it to explicit wording in the Standard, even if it
is insane behavior, if the result of such testing might end up in
litigation.  These "RA" cases deserve to be uncovered, however, since
many applications are and will be assuming reasonable implementations
and the programmer should hear about it if an implementation goes out
of its way to introduce unwarranted obstacles.

>(2) One person's "perverse" is another person's "reasonable".

My notion of a perverse implementation is one that deliberately
exploits an ambiguity in the specification, with considerably more
effort than would be required to implement "reasonable" behavior
such as would be expected by experienced C programmers, to force
portable programs to use complex code to avoid triggering the
perverse behavior.  If someone else thinks such an implementation
should be deemed reasonable, he's disconnected from the real world
of computer programming.

This does not include reasonable implementation choices that are
forced by the architecture or environment, just those for which
there is no good reason.

As an example of how an implementation can be perverse, suppose that
a POSIX implementation really literally obeyed the last word of the
specification in IEEE Std 1003.1-1988 section 8.2.3.4.  While that
would be literal conformance to the specification, and while a
reasonable implementation would have to violate that part of the
specification, it would nevertheless be perverse to obey that spec,
which is obviously an error due to incomplete editing of text copied
from scetion 8.2.3.2.

ps@fps.com (Patricia Shanahan) (01/17/90)

In article <11954@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <1259.25ae2019@csc.anu.oz> bdm659@csc.anu.oz writes:
>My notion of a perverse implementation is one that deliberately
>exploits an ambiguity in the specification, with considerably more
>effort than would be required to implement "reasonable" behavior
>such as would be expected by experienced C programmers, to force
>portable programs to use complex code to avoid triggering the
>perverse behavior.  If someone else thinks such an implementation
>should be deemed reasonable, he's disconnected from the real world
>of computer programming.
>
>This does not include reasonable implementation choices that are
>forced by the architecture or environment, just those for which
>there is no good reason.
>

The first significant C programming job I ever did was a re-targeting of the
portable C compiler to an unusual architecture. Although I was an experienced
programmer, had read K&R (which was the only standard at the time) and had
written a couple of small C programs, I was not an "experienced C programmer".

I really needed a clear standard that did not depend on the concept of 
"reasonable" behavior. If you are going to depend on unwritten background data
to interpret the standard, you don't really need the standard. A good 
standard should permit a skilled programmer to write a correct implementation
without having access to an unwritten code of "reasonable" behavior.

I solved the problem by treating the actual behavior of pcc as a de facto 
standard. Unfortunately, this gave me some problems in the area of mixed type
assignment-ops where the actual behavior of pcc and the statements in K&R
conflicted.

Although I had no intention of being perverse, there were situations in which
K&R appeared to permit behavior that in practice was dangerous. For example,
the natural way of representing a pointer to function was with a half-word
(rather than byte) address. All the jump commands that would be used in calls
required half-word addresses. I set up a perfectly consistent system, in which
conversion between a pointer to function and (for example) an integer involved
one bit shifts. It suited the architecture well and conformed to every rule
about pointer types and conversions that I could find in K&R. Then I found
out some of the details of the signal interface. 

This is really a plea for making sure that the standard actually says 
EVERYTHING that "experienced C programmers" are going to be justified in
expecting of a C implementation, and does not depend on the implementor already
knowing what is expected. I do not think you can assume someone is disconnected
from the real world of computer programming just because they cannot read
the collective minds of all experienced C programmers to find out what is
really required in a C implementation, especially since I have seen it amply
demonstrated that experienced C programmers often disagree about what is 
reasonable.
--
	Patricia Shanahan
	ps@fps.com
        uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
	phone: (619) 271-9940

bdm659@csc.anu.oz (01/17/90)

In article <6203@celit.fps.com>, ps@fps.com (Patricia Shanahan) writes:
> [...]
> I really needed a clear standard that did not depend on the concept of
> "reasonable" behavior. If you are going to depend on unwritten background data
> to interpret the standard, you don't really need the standard. A good
> standard should permit a skilled programmer to write a correct implementation
> without having access to an unwritten code of "reasonable" behavior.
>
> [...]
> This is really a plea for making sure that the standard actually says
> EVERYTHING that "experienced C programmers" are going to be justified in
> expecting of a C implementation, and does not depend on the implementor already
> knowing what is expected. I do not think you can assume someone is disconnected
> from the real world of computer programming just because they cannot read
> the collective minds of all experienced C programmers to find out what is
> really required in a C implementation, especially since I have seen it amply
> demonstrated that experienced C programmers often disagree about what is
> reasonable.

I would also note that I wish to be able to write C programs which will be
portable well into the future (at least until the next C standard, or until
we blow up, irradiate, poison, asphyxiate, freeze, or cook ourselves
--- whichever comes first).  Being able to read the minds of all existing C
programmers, and being an expert on all existing hardware, is insufficient to
be sure of what the boundary between "reasonable assumption" and "doubtful
assumption" will be in, say, 1997.  That is one reason why it is worth knowing
what the True Assumptions are, and programming within them.  Fortunately,
this isn't all that difficult in the great majority of cases.

Brendan McKay.   bdm@anucsd.oz.au  or  bdm659@csc1.anu.oz.au