[comp.std.c] __STDC__, _POSIX_SOURCE, etc.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/21/89)

In article <12040006@hpfcdc.HP.COM> rml@hpfcdc.HP.COM (Bob Lenk) writes:
>At this point 1003.1 seems to be in a "Catch 22" situation.  Folks want
>something specified by the compiler rather than the application, but
>then they claim that a compiler that usurps the namespace must not
>define __STDC__.

Yes, it's a difficult situation, but technical solutions are possible.

>I don't recall any recommendation from X3J11.  What was it?

I suspect Donn Terry has the original letter.  I know he and I
discussed this issue, and it ended up with the introduction of
the stuff about "feature test macros" into IEEE Std 1003.1.

I'm appending the text of the letter to this (long) article.

>Where do you read this?  I see in section 2.8.2.1:

I think you're right about 1003.1 Section 2.8.2.1, but the definition
of _POSIX_SOURCE in 2.8.2 makes it appear that "the symbols defined by
this standard" will NOT be provided by the environment unless the
_POSIX_SOURCE feature-test macro is "present" in the program.  Further
it is defined to prohibit extensions in a header for which "no explicit
constraint on the form of the name is provided by this standard", which
I take to mean that, for example, <sys/times.h>'s declaration of
"struct tms" shall not contain any members other than those enumerated
in Section 4.5.2.2.  I think actually it was intended to permit others
of the form tms_* even when _POSIX_SOURCE is in effect, and not names
of other forms, but I don't see any such "explicit constraint" and so
even additional tms_* names seem to be outlawed under _POSIX_SOURCE --
which is pretty grim when you think about the consequences for the
implementation.

>In other words, _POSIX_SOURCE (a feature test macro) does turn on
>definition of symbols not defined by the C Standard (ANSI C-prohibited
>extensions).

I'll be happy if that is one generally accepted consequence.
Is this really intended to take precedence over the statement in the
definition of _POSIX_SOURCE that "the symbols defined in this standard
will be provided by the environment"?

>If we accept that (and the analogous relations with other, future
>standards) as impossible, what happens when 1003.2 adds popen() to
><stdio.h> and 1003.4 adds symbols to 1003.1 headers like <fcntl.h>
>and <sys/stat.h>?

I'm sure that the idea of "feature test macros" to enable visibility
of such extensions is a good idea.  Of course even better is isolating
the extensions by giving them their own headers.  The important
portable-programming requirement is that the PROGRAM must take
explicit action for such additional identifiers to become usurped for
use by the extension(s).  This conflicts with the natural goal of
vendors who wish to have existing customer applications compile
unchanged in the new, standard-conforming environment.  Personally,
as a customer I am happy to make these changes once and for all and
never have to tweak my application for a new environment again!  But
I think there is a perception that existing customers would not all
be quite so happy at the extra work.

>Practically I think that ANSI C should explicitly permit the namespace
>to be used in any way explicitly or implicitly specified by the user, as
>long as there is a clearly documented way to get a completely clean
>namespace.

The suggestion was made before, more than once, but nobody was able to
figure out a good way to specify such a notion in proper standards
terms.  The best we were able to do was to specify what constitutes
conformance to the Standard.

>If an application is compiled with the straight ANSI compiler,
>it can test __STDC__ for whatever it is interested in.  It can define
>_POSIX_SOURCE itself if it wants POSIX symbols.

That's fine with me, if indeed the POSIX vendors are really going to
hide fdopen etc. in <stdio.h> unless I #define _POSIX_SOURCE before
including it in my application.  My impression was that they wanted
fdopen etc. to be visible BY DEFAULT in their "POSIX" implementation,
which clearly causes potential problems when porting strict-ANSI C
applications into such a "POSIX" environment.

>If it wants to know whether POSIX is supported, it can then test
>_POSIX_VERSION from <unistd.h> (although the include of <unistd.h>
>may fail on a non-POSIX system, I doubt this would be an issue to any
>application that cared about this question).

I can't imagine any useful way of doing this particular test.  There
probably should have been a POSIX equivalent of __STDC__ for this...

>1003.1 could have addressed this by requiring one of two symbols like
>_POSIX_STANDARD_C or _POSIX_COMMON_C to be defined, but that really
>seems to be redundant with the purpose of __STDC__, and in the scope of
>a language standard rather than an OS standard.

One of the practical problems is that in the process of standardizing
the C language interface to UNIX-like systems, the goal was (fairly
late in the game) changed somewhat, due to demands from other language
groups to separate the spec into an OS facilities spec and individual
language bindings.  Unfortunately the 1003.1 as published does not do
this; there is not a really good abstract OS spec.  (This is not in
itself a criticism; it's hard to produce such a spec.  But it's even
harder when that wasn't your original goal.)  I think the argument
you just gave was, or would have been, made to the suggestion of those
particular symbols being specified.  Then, also, when I was actively
attending 1003.1 meetings in earlier days, there was no intention to
promulgate separate "common usage" and "standard" C specifications; it
was going to be built on top of the ANSI C spec.  A final contributing
factor to the (in my opinion) inadequate collaboration between the C
and POSIX committees was that our liaison agent missed several meetings,
and after I fell into the role of his replacement, I was unable to
attend the final few 1003.1 meetings during which the current wording
on this matter was drafted/approved/whatever.  There is a limit to
how much can be done by phone.

So I think historically there were several reasons for the situation.
I do hope that other standardization efforts learn from this at least
the necessity of (a) clearly defining committee goals then sticking to
them, (b) carefully packaging their features so they can be provided
without interfering with other related standards, and (c) not meeting
at the same time as other groups with which liaison is being attempted!

>I think it's unfortunate that X3J11 didn't consider that the C library
>is (at least in some valid cases) a foundation upon which other
>environments build rather than a complete environment, and thus didn't
>allow for graceful additions to the namespace consistent with existing
>practice.

The real problem with that is, existing practice in this area was NOT
"graceful"; it was DISgraceful.  The proposed C Standard certainly does
allow for clean extensions, provided they're done in some way other
than mucking around with the standardized facilities.  (For example,
use separate headers.)  I'm sure that P1003 would have (and has) been
amenable to cooperation with this, except for the thorny problem of
existing implementation practice with regard to the standard headers
like <stdio.h> -- which simply could NOT be used safely by portable
programs at all due to the surpises they often contained.  It might
have been better for X3J11 to throw out the existing header/library
practice and define their own new set, but that wasn't in their charter
(it WAS a temptation).  By standardizing what is supposed to be in the
standard headers that everyone was already trying to use, necessarily
what ISN'T in them has to be specified in addition to what IS in them.
Based on my experience in porting C code, I think essentially the right
decision was made on this.  But now we have a difficult transition to
make somehow.

>If they had addressed the problem sufficently and explained
>the solution clearly, there wouldn't be all this debate about how
>1003.1, Microsoft, AT&T, et. al. are dealing with the situation.

Several attempts were made to explain the significance of the name
space guarantees.  I'll take a share of the blame for failure to
communicate (but I don't claim ALL the blame!).


		========== X3J11 letter to P1003: ==========


From:  X3J11 Technical Committee

To:  Technical Committee on Operating Systems
     IEEE Computer Society

Date:  14 December, 1987

Subject:  Conflicts within P1003.1/Draft 12

Despite statements in the proposed POSIX standard P1003.1/Draft 12
in sections 2.2.3.1 and A.2.1 that strongly suggest that a single
implementation could be simultaneously IEEE 1003.1 and ANSI
X3.159-198x C standard-conforming, which is certainly highly
desirable, there remain several specifications in P1003.1/Draft 12
that are incompatible with this goal.  Originally identified as a
conflict involving just the header file <limits.h> shared by both
standards, upon further investigation the problem has turned out
to be more pervasive.

To understand this issue and its importance, first the role of
"name space" must be appreciated.  Because the primary purpose of
the proposed ANSI C Standard is to promote portability of programs
among a wide variety of environments, it guarantees certain
facilities that all conforming implementations will provide for
use by application programs.  One of the most important guarantees
is that only those names specifically reserved for implementations
(whether external or defined in standard headers) by the Standard
are unsafe for use by programs; all other names are in the "name
space" guaranteed to be available for writing portable programs.
Because it encompasses a wide range of operating systems, it is
inappropriate for the ANSI C Standard to attempt to reserve
operating system-specific names for implementations, except as
general classes of names (for example, names starting with the
letters "SIG", or names starting with an underscore).

In general, C libraries are free to include system-specific
external names (for example, "open"), so long as the ANSI C
implementation does not use them.  (Otherwise, a portable
program could inadvertently supply a function of that name,
thereby causing the ANSI C library to malfunction.)  The most
dangerous use of system-specific names is in standard header
files, which could modify the behavior of a standard-conforming
program in undesirable ways.  The headers <limits.h>, <signal.h>,
and <stdio.h>, defined in the ANSI C Standard, are "extended" by
P1003.1/Draft 12 to include many additional symbols and external
declarations.  Unfortunately, no mechanism was provided to ensure
that these additional names would remain invisible to an ANSI C
conforming program.  This is the issue that currently prevents
simultaneous ANSI C and IEEE 1003.1 conformance by an
implementation: the added names must not be visible to a program
for ANSI conformance, but they must be visible for 1003.1
conformance.

Several technical solutions for this problem have been proposed,
including moving the definitions and declarations to other
header files.  Given the extent of the changes that most such
solutions would require, at this late date it is probably best
to adopt the smallest, simplest change that will resolve the
conflict.  This would be to add to the 1003.1 specification a
requirement that, in order to obtain any definitions or
declarations other than those permitted by the ANSI C Standard
from inclusion of a standard header described in the ANSI C
Standard, prior to the header inclusion a program must ensure
that a special symbol is defined.  We suggest _POSIX_VERSION
(described in P1003.1/Draft 12 section 2.10.3) for this special
symbol.  It should be noted, probably in 1003.1 Appendix B
(Rationale and Notes), that this is most readily accomplished
by including <unistd.h> before the ANSI C header.  Since most
POSIX-based applications are likely to include <unistd.h>,
this constraint does not seem to impose an undue burden, and
it resolves the extremely important "name space pollution"
problem for portable ANSI C conforming applications, thereby
enabling implementations to simultaneously conform to both
standards.

The technical details of the use of the special symbol follow:

In the header shared by ANSI C and 1003.1, the POSIX definitions
are conditionally enabled by the special symbol, as in the
following abridged example:

/* Contents of file <limits.h>: */
#ifndef __LIMITS_H_INCLUDED
#define __LIMITS_H_INCLUDED	/* disable redefinition */
#define CHAR_BIT	8
/* ... other ANSI C definitions ... */
#ifdef _POSIX_VERSION		/* not defined for ANSI C */
#define MAX_INPUT	255
/* ... other 1003.1 definitions ... */
#endif
#endif

donn@hpfcdc.HP.COM (Donn Terry) (01/23/89)

To fill in a detail that came up in the discussions of now _POSIX_SOURCE
was to work.

It became pretty obvious that there were *very* few non-trivial programs
that were 100% POSIX conformant "by accident".  Thus to be POSIX conforming
the program would have to change at least slightly.  At that time, adding
_POSIX_SOURCE to the source code is not a big concern.  (Not that we
wanted things to change, but a lot of little things had to; UNIX had
just collected too much cruft over time.)  For the programs that are
already 100% POSIX conformant, congratulations to the authors!  (for
clarivoiance :-) ).  (I think there's a better chance for pure ANSI
programs, but I have a hunch that there's not a lot of them either.  I
do believe that for either ANSI C or POSIX the changes to conform are
either trivial or nearly impossible (not much middle ground).  In the
latter case, wait for 1003.4, or whatever.)

We also realized that many vendors would want to provide a backwards
compatible environment (either because they hadn't gotten an ANSI C
compiler on line, or that they had both).  Thus _POSIX_SOURCE had the
special meaning for common usage, which gave the same set of external
symbols (ANSI+POSIX) in both possible C environments.

For "unreconstructed" source (just about all there is today) the user
uses the "old" compiler, however the vendor specifies it (and it
is probable that that will be "cc" with no options initially).  This
is a "no changes and it still works" environment.  As the programs 
are inspected for ANSI/POSIX conformance, switches are set on the
compile line and in the source (_POSIX_SOURCE) to address the
exact environment needed.  Of course if there is a huge collection
of source that needs it and not other changes, it can be put into
CFLAGS for the make.  I suspect that the default will change to be
ANSI in a few years, and a few years after that the issue will go
away.  

I am *very* pleased at the 1003.2  -v ansi / -v common idea.  By
standardizing this, a big set of portability problems are cured
because it brings back a higher degree of portability in makefiles,
almost to where we started from.  (Now if that standard would only
hurry up and get ratified.)

As a software writer, I'm not sure I'd want to have _POSIX_SOURCE 
defined by the compiler for me (not counting my own -D, of course).
Before converting to POSIX, I'd want my same old environment.  
After, I'd prefer to be in control.

Donn Terry
Chair 1003.1

I speake only for myself; neither my employer nor IEEE necessarily
share the same opinions.   (You ought to see the "official" long form
of this!)

rml@hpfcdc.HP.COM (Bob Lenk) (01/25/89)

In article <9458@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:

> Yes, it's a difficult situation, but technical solutions are possible.
> ...
> I'm appending the text of the letter to this (long) article.

Thanks for posting the letter.  It did help clarify some of your points
to me.

>                                                        the definition
> of _POSIX_SOURCE in 2.8.2 makes it appear that "the symbols defined by
> this standard" will NOT be provided by the environment unless the
> _POSIX_SOURCE feature-test macro is "present" in the program.

The closest wording I can find to this is:

	Feature test macros shall be defined in the compilation of
	an application before a #include of any header where ...

I understand this to permit a "#define _POSIX_SOURCE" in the source,
a "-D_POSIX_SOURCE" on the command line, or an invocation of a compiler
that predefines _POSIX_SOURCE.

>                                                                Further
> it is defined to prohibit extensions in a header for which "no explicit
> constraint on the form of the name is provided by this standard", which
> I take to mean that, for example, <sys/times.h>'s declaration of
> "struct tms" shall not contain any members other than those enumerated
> in Section 4.5.2.2.  I think actually it was intended to permit others
> of the form tms_* even when _POSIX_SOURCE is in effect, and not names
> of other forms, but I don't see any such "explicit constraint" and so
> even additional tms_* names seem to be outlawed under _POSIX_SOURCE --

I see how this interpretation is reasonable.  The statement seems to
contradict one made shortly prior to it:

	Implementations may add members to a structure or union
	without controlling the visibility of those members with
	a feature test macro.

Doug's interpretation is probably the best resolution of the apparent
contradiction.  My interpretation, somewhat educated by discussion with
the author of this section, is that the statement limiting the extensions
made visible by _POSIX_SOURCE is poorly worded, and that the intention
was to permit _POSIX_SOURCE to make any additional fields in structures
visible.  This is, of course, *not* an official interpretation of
IEEE Std 1003.1-1988.  An official interpretation can be given only
in response to a formal request to the IEEE; any interested party can
make such a request.  I will suggest that the 1003.1 working group
consider clarification of this wording in a supplement to the standard.

> >In other words, _POSIX_SOURCE (a feature test macro) does turn on
> >definition of symbols not defined by the C Standard (ANSI C-prohibited
> >extensions).
> 
> I'll be happy if that is one generally accepted consequence.
> Is this really intended to take precedence over the statement in the
> definition of _POSIX_SOURCE that "the symbols defined in this standard
> will be provided by the environment"?

I don't understand why any precedence is needed.  By seeing to it that
_POSIX_SOURCE is defined, the program indicates that it "expects the
symbols defined by this [1003.1] standard will be provided by the
environment."  Since some of "the symbols defined by this standard" are
symbols not defined by the C Standard (ANSI C-prohibited extensions), it
follows that _POSIX_SOURCE turns on those ANSI C-prohibited extensions.

> I'm sure that the idea of "feature test macros" to enable visibility
> of such extensions is a good idea.  Of course even better is isolating
> the extensions by giving them their own headers.  The important
> portable-programming requirement is that the PROGRAM must take
> explicit action for such additional identifiers to become usurped for
> use by the extension(s).  This conflicts with the natural goal of
> vendors who wish to have existing customer applications compile
> unchanged in the new, standard-conforming environment.

There is no conflict necessary.  A vendor can supply a backward
compatible invocation of a compiler that pre-defines whatever feature
test macros are needed to make all old symbols visible.  Then one
additional invocation of an ANS-compatible compiler is also supplied.
From that point the user can build any environment by defining
appropriate feature test macros.  The vendor can also choose to make it
easy to invoke the compiler with selected combinations of feature test
macros defined.

> >Practically I think that ANSI C should explicitly permit the namespace
> >to be used in any way explicitly or implicitly specified by the user, as
> >long as there is a clearly documented way to get a completely clean
> >namespace.
> 
> The suggestion was made before, more than once, but nobody was able to
> figure out a good way to specify such a notion in proper standards
> terms.

Perhaps the best way would be to suggest guidelines for the use of
__STDC__ in the rationale.  This is along the lines of what Dave Prosser
is suggesting (though I'm not sure I agree with him on the specific
content of the guidelines).

>If an application is compiled with the straight ANSI compiler,
>it can test __STDC__ for whatever it is interested in.  It can define
>_POSIX_SOURCE itself if it wants POSIX symbols.

> >If it wants to know whether POSIX is supported, it can then test
> >_POSIX_VERSION from <unistd.h> (although the include of <unistd.h>
> >may fail on a non-POSIX system, I doubt this would be an issue to any
> >application that cared about this question).
> 
> I can't imagine any useful way of doing this particular test.  There
> probably should have been a POSIX equivalent of __STDC__ for this...

What's wrong with

	#include <unistd.h>
	#ifdef _POSIX_VERSION

or, for those concerned with implementations conforming to unapproved
drafts:

	#include <unistd.h>
	#if _POSIX_VERSION >= 198808L

>1003.1 could have addressed this by requiring one of two symbols like
>_POSIX_STANDARD_C or _POSIX_COMMON_C to be defined, but that really
>seems to be redundant with the purpose of __STDC__, and in the scope of
>a language standard rather than an OS standard.

> So I think historically there were several reasons for the situation.

I agree, and I think you've mentioned some of the most important ones.

> >If they had addressed the problem sufficently and explained
> >the solution clearly, there wouldn't be all this debate about how
> >1003.1, Microsoft, AT&T, et. al. are dealing with the situation.
> 
> Several attempts were made to explain the significance of the name
> space guarantees.

I think there are problems with both of the standards not being clear
enough to those who weren't intimately involved with a specific part.
This is difficult to avoid, and I realize Doug is one who put a lot
of good effort into avoiding it.  I can't think of any solution beyond
broader involvement in the creation and review of the standards.

>		========== X3J11 letter to P1003: ==========

I believe that the 1003.1 solution with _POSIX_SOURCE is better than the
one suggested in the letter (or any other alternative I've seen).  The
only real difference I see in the solution suggested in the letter is
that it uses the symbol _POSIX_VERSION rather than _POSIX_SOURCE, and
that _POSIX_VERSION is defined in <unistd.h>.  The stated benefit is
that the program does not have to be modified to #define _POSIX_SOURCE,
since _POSIX_VERSION is defined by the implementation.  In practice this
does not seem to be a real benefit, since it applies only to programs
that already #include <unistd.h> before #including any other header (or
at least any ANSI C related header); I doubt this covers any large
number of existing programs.  Any other program requires a source
change, which seems like no benefit over adding #define _POSIX_SOURCE.
Conversely, in existing UN*X environments, there are ways to define
_POSIX_SOURCE without modifying the source by use of the -D compilation
flag (and perhaps tools beyond that).  _POSIX_SOURCE has the additional
advantage that it can be extrapolated to a set of feature test macros
for all sorts of symbols defined by various standards, de-facto standards,
portability guides, and implementations.

		Bob Lenk
		hplabs!hpfcla!rml
		rml%hpfcla@hplabs.hp.com

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/26/89)

In article <12040009@hpfcdc.HP.COM> rml@hpfcdc.HP.COM (Bob Lenk) writes:
>The stated benefit is that the program does not have to be modified to
>#define _POSIX_SOURCE, since _POSIX_VERSION is defined by the
>implementation.  In practice this does not seem to be a real benefit,
>since it applies only to programs that already #include <unistd.h>
>before #including any other header (or at least any ANSI C related
>header); I doubt this covers any large number of existing programs. 
>Any other program requires a source change, which seems like no benefit
>over adding #define _POSIX_SOURCE.

I believe the reasoning was that technically <unistd.h> was going to
have to be included by almost any 1003.1-conformant application, and
that the automatic approach that takes care of the additional name-
space-in-standard-header issue would be less trouble than requiring
yet a second edit to the application as apparently required by the
_POSIX_SOURCE invention.  I will admit that finding "#define
_POSIX_SOURCE" as the first line of a source file might serve as a
useful indication that somebody has "POSIXized" the source.

If it is generally going to be the case that the POSIX compilation
environment predefines _POSIX_SOURCE then that is not a good argument.

I like the suggestion that 1003.1 publish a clarification of this
whole business.  If there are some non-required but generally agreed
techniques for the definition and use of these symbols, perhaps it
would be appropriate to try to steer vendors along the same path also.

rml@hpfcdc.HP.COM (Bob Lenk) (01/31/89)

> If it is generally going to be the case that the POSIX compilation
> environment predefines _POSIX_SOURCE then that is not a good argument.

Any environment that does this had better be careful.  A reasonable
program written to the 1003.1 standard might begin with either

	#define _POSIX_SOURCE
or
	#define _POSIX_SOURCE 1

> I like the suggestion that 1003.1 publish a clarification of this
> whole business.  If there are some non-required but generally agreed
> techniques for the definition and use of these symbols, perhaps it
> would be appropriate to try to steer vendors along the same path also.

I agree that agreeing upon and publishing conventions would be useful.
How is this different than X3J11 publishing guidelines for the use of
__STDC__ covering such things as __STDC__ == 0?

		Bob Lenk
		hplabs!hpfcla!rml
		rml%hpfcla@hplabs.hp.com