[comp.std.c] _POSIX_SOURCE

donn@hpfcdc.HP.COM (Donn Terry) (01/20/89)

I want to respond to some of the comments made about _POSIX_SOURCE in the
__STDC__ string.  Please read to the end before responding, there's some
stuff there that may help a lot in understanding this.

>>Now one form of "strict ANSI non-compliance" (i.e., the negation of
>>"strict ANSI compliance") that could be useful would be that exhibited
>>by an implementation that conforms to ANSI C except for POSIX items that
>>might get in the way.

>The main problem seems to be that vendors want to include their old
>extra UNIX cruft in <stdio.h> without requiring their customers to
>use a different form of compilation from the ANSI C mode.  Even 1003.1
>wants to do this.  I don't think that is even possible.

   There are two different issues here:  there's the language accepted
   by the compiler.  That clearly requires an option of some form
   (either a new compiler or flags). 
   
   There's the completely orthogonal issue of namespace.  First and
   foremost "vendors" don't want backwards compatability:  it would be
   much easier to say to the customer "throw away all your old C stuff
   and rewrite from scratch, we have a Staaanndddarrd!"  However
   customers (in terms of end users) will puke.  There are a few
   "sophisticated" customers that will just love to have this to get the
   benefits of the standard, but most won't, at least immediately.  Thus
   the vendors have to provide some form of backwards compatabilty, and
   in general that has to be identical to the old environment, warts and
   all.  (If it doesn't just up-and-compile, it's not compatible!)
   (This does include scripts/makefiles!)

   However, customers also want the benefits of having all their old
   makefiles (and old crusty programmers) work just fine when they
   finally go to the new C, so of course cc should read their mind
   and figure out which kind of C they want.  (Given the current
   lack of telepathy interfaces that looks hard.)

   The idealistic positions taken by X3Jll have made it all the harder
   to deal with backwards compatability than it need have been.  In fact
   (very personal opinion) I would rather have seen them change the
   language incompatibly a bit than have introduced the namespace rules
   as they did.  They created a much harder transition problem in the 
   long run.

>So then the
>question becomes, how to most readily make one compiler serve both
>environments AND the new POSIX environment.  That really requires two
>bits of information to properly select all possible cases.  ANSI C
>__STDC__ is apparently being relied on to switch on the "ANSI C or not"
>bit.  _POSIX_SOURCE, a late invention to try to resolve the name space
>conflicts in standard headers that define things for both ANSI C and
>POSIX, was specified backward from what X3J11 had recommended;
>consequently vendors such as AT&T have decided that it is not suitable
>for use as the other selection bit and are trying to overload __STDC__
>for this.

   X3J11 made no such recommendation.  I had a few telephone and mail
   conversations with Doug on the issue of _POSIX_SOURCE, but those
   were taken as expert individual recommendations.  Although much
   of that work is now a blur, I don't remember any comment to the
   effect that _POSIX_SOURCE had the wrong sense.

>_POSIX_SOURCE, specified as being supplied by the
>APPLICATION, not the implementation, was specified as turning OFF
>POSIX-permitted extensions, but what was really needed was a way to
>turn ON ANSI C-prohibited extensions, which is not at all the same
>thing, and in any event the vendors don't want to tell their customers
>to #define _POSIX_SOURCE in order for their existing code to compile.

   This is only in part true:  what Doug describes is correct for
   common-usage.  However, in this discussion, common-usage doesn't
   matter.  For ANSI C, it turns on the POSIX extensions.  (Yes, it's
   meaning varies between the two kinds of C!)  This has the interesting
   (and intentional) effect that if _POSIX_SOURCE is defined then in
   either common-usage or ANSI C the headers supply the same set of
   symbols:  ANSI C plus POSIX.  In a pure ANSI environment it has
   exactly the desired effect: adding symbols to the set available
   to the user only when the user takes explicit action.

   _POSIX_SOURCE was included in the program, rather than being defined
   by the environment, to increase the portability of the application.
   This way the user does not have to change his/her compilation environment
   depending on the exact feature set in use.  Rather the program itself
   asserts which one is in use.  Passing this information in from the
   outside then makes the makefiles, or whatever, much more difficult
   to use.  (e.g., what should make's .c.o: rule read in the face of a 
   lot of options, particularly command-name options?  Given only
   ANSI C to deal with, there's only one answer with _POSIX_SOURCE
   in the program: ANSI C.  The source program will take care of itself if
   it needs POSIX.)

>The simplest solution would have been:
>	cc	# backward-compatible sloppy UNIX mess version
>	acc	# ANSI-conforming version
>	pcc	# POSIX- and ANSI-conforming version
>	fcc	# like "pcc" but with extra ANSI-prohibited stuff.
	
   I (and presumably the IEEE balloting group) disagree.  We see 
   a layered model here: ANSI, ANSI+POSIX, ANSI+POSIX+POSIX.4,
   ANSI+POSIX+POSIX.4+X/OPEN, etc. (and in all combinations).  
   Separate compilers for every combination like this would be
   awful!

>Presumably all four would predefine various secret __names then
>invoke common compiler passes, perhaps with additional flags to turn
>on Reiserisms in cpp etc.  What I would have for the two names known
>to the programmer in each of the four cases are:
>	cc	__STDC__ not defined
>	acc	__STDC__==1
>	pcc	__STDC__==1
>	fcc	__STDC__ not defined

   __STDC__ coming in from the environment is probably useful because
   the issue of "which language syntax is accepted" is not something
   that is easily figured out and programs do want to adapt.  However,
   adding on additional symbol names is better imbedded in the program.
   Again, the idea is for the program to tell it's environment what it
   needs, not have the environment tell the program what it gets.

>>... with "portable" meaning "portable to POSIX-conformant systems"
>>rather than "portable to ANSI C-conformant systems. 

>But POSIX-conformant systems are supposed to be ANSI C-conformant
>also, unless they are specifically advertised as "Common Usage C"
>versions.  This means that a (POSIX+ANSI C)-conformant implementation
>CANNOT declare fdopen() in <stdio.h> as default behavior. 
  
  Yes, that was exactly the intent in part because of the namespace
  pollution rules.  We had a goal of having a single compiler provide
  all of the various levels of features that might come along.  Thus
  there had to be some sense of the application asking for what it needed, 
  either by the inclusion of headers or by the use of a flag such as
  _POSIX_SOURCE.

>Given the
>way _POSIX_SOURCE ended up being specified, any POSIX application that
>requires fdopen() MUST #define _POSIX_SOURCE before including
><stdio.h>.  (In fact, I would argue that ALL applications HAVE TO
>#define _POSIX_SOURCE in order to meet the letter of the POSIX spec.)

   Nope:  a program conforming to "the applicable language standard" is
   a POSIX conforming program.  For C that's ANSI C.  There's one for
   COBOL, too.  A pure ISO COBOL program is (trivially) a POSIX
   conforming program.  However, in an ANSI C conformant environment it
   has to define _POSIX_SOURCE to get at fdopen().

>It would be nice if one of the supported compiler invocations ("pcc"
>in my list above) would predefine _POSIX_SOURCE (which ANSI C DOES
>allow) so that one wouldn't need to edit source files to accomplish
>this (and would only need to change the definition of "CC" in one's
>master Make.config).  That would be far more useful than the apparent
>choice of compilation environments AT&T seems to be heading toward.

  Editing source files is EXACTLY what was trying to be avoided: the
  program asks for the features it needs from the environment.

>>POSIX does not, as I remember, require ANSI C conformance; ...

>(Technically, the use of cross-reference to sections of the C Standard
>is pointless unless the POSIX implementation is ANSI C conformant,
>because the C Standard does not constrain non-conforming
>implementations IN ANY WAY.  But this is just a legality and was not
>actually intended to be interpreted that way.)

   The **C** standard doesn't constrain a non-conformant (to C) implementation
   in any way, but the POSIX standard can constrain the POSIX implementations
   in any way it needs to, including reference to another document for
   the specification of its constraints.  We just happen to reference
   the C standard to explain the constrants on the common-usage C
   environment.

>>One possibility for these #ifdef might be specific names for particular
>>functions; unfortunately, there's no standard for those names, so the
>>writer can't assume something and hope for the best.  Unfortunately,
>>alternatives involving __STDC__ have the problems you list.  I don't
>>think there's anything that POSIX defines that says "this implementation
>>is ANSI C, with the exception of this specified list of extra goobers in
>>the namespace"; if there isn't, it's too late to fix it in POSIX.

>No, there's nothing like that.  Only _POSIX_SOURCE even comes close.

_POSIX_SOURCE turns on the additional POSIX stuff.  It doesn't turn on
additional (currently not standardized stuff) beyond that (although it
doesn't explicitly prohibit that, maybe it should).

>>The best I see that could be done here is to make a strong
>>recommendation that POSIX vendors define
>>__ANSI_C_EXCEPT_FOR_POSIX_STUFF__ (or some other specified name) to
>>match what __STDC__ would have been defined as had the "allow POSIX
>>stuff" flag not been given to the compiler.

>Practially ANY de facto standard not involving __STDC__ would be fine
>so far as I'm concerned.  As it stands, we have no satisfactory
>solution.  I suggest that "pcc" mean "POSIX and almost ANSI C conforming
>with the symbol _POSIX_SOURCE predefined for your convenience" (the
						  ^inconvienience, in
				my opinion!  Then you have to change
				your makefiles, etc.
>only deviation from ANSI C conformance being the extra POSIX stuff in
>the standard headers) and that "acc" mean "ANSI C conforming".  "cc"
>should mean "our closest approximation to the cc you were already using,
>except for possible compatible extensions".  I don't think there is any
>practical need for a "POSIX and ANSI C conforming" invocation of the
>compiler; "pcc" should be good enough.

>It's really unfortunate that 1003.1 didn't straighten out more of the
>name space problems historical UNIX has had.

  In fact, I believe that further "straightening out" of the namespace would
  be a disservice to the user community in the short run, and it looks
  like actions being taken by folks such as X/Open will provide the
  de-facto standards required to straighten it out over the long run.

  If (as I suspect HP and X/Open are doing) vendor and other standard
  body extensions are turned on by macros similar to _POSIX_SOURCE (when
  headers don't do the job) then in fact the problem will cure itself
  over time, as the minimal namespace required by ANSI C will force the
  vendors to provide some sort of controlled namespace with controlled
  extension.

In writing this I came to an interesting conclusion:  it appears that
there are two different mindsets on the use of the flags such as
__STDC__ and _POSIX_SOURCE:  you can have the environment tell you what
you can do (__STDC__) or you can tell your environment what you need to
do (_POSIX_SOURCE).  POSIX also allows you to inquire of the environment
what you are getting via the use of _POSIX_VERSION.  Of course if
<unistd.h> isn't there you're out of luck, but then you probably
were anyway...

In a relatively homogenous environment (such as POSIX) telling your
environment probably makes more sense.  In dealing with a program
attempting to run on everything short of a lawnmower having the
environment tell you is probably better (__STDC__).  (Languages in
general have this problem; it's not just C!)

There are instances of the reverse need either way, and these can be
addressed with today's tools, but at the expense of reducing the
portability of the program package; everything you have to tell the
program from the outside is a nuisance because you have to tell it
differently in every environment in which you run it.

By program package I mean the stuff it takes to compile a program:
scripts, makefiles, whatever, as well as the actual source code.  For
many programs dealing with that stuff is more of a problem than dealing
with the C (or whatever) code itself.  Portability of these things is
what 1003.2 is all about.  By forcing a solution such as [a-z]cc the
portability problem is worsened because not only does the C program have
to know what it needs, but so does the build environment.  (Historically,
the systems like POSIX have had at least as good portability of the 
makefiles and scripts as the C programs; often those are more portable;
let's not loose that!)

Note that if the implementation is not claiming POSIX (or Standard C)
then portability to that environment is not an issue from the point
of view of the standard.  (I know... reality intrudes!)  You do what
you have to do to make it work in a non-standard environment.  Within
the standard environment _POSIX_SOURCE tells you what you need, and
by definition it is provided.  (It presumes the minimal baseline that
ANSI C provides, if you will.)  If the environment is not POSIX, you
need to know a *lot* more than the fact that it isn't to figure out
what to do, and that's where importing the information from the 
environment is important; that's what the -D option is for.

If you look at this discussion (on the meaning of _POSIX_SOURCE vs.
__STDC__) in the light of two different ways of looking at the need, the
arguments become much clearer as to why they are being made.  I believe
that internal-to-file specification will work out better in the long
run, but we'll see in a few years.

Donn Terry
Chair, IEEE 1003.1

I speak only for myself. Neither my employer or IEEE necessarily endorse
any positions taken herein.

P.S.  No, I didn't say a word on the __STDC__==0 issue!

gwyn@smoke.BRL.MIL (Doug Gwyn ) (01/21/89)

In article <12040005@hpfcdc.HP.COM> donn@hpfcdc.HP.COM (Donn Terry) writes:
>I want to respond to some of the comments made about _POSIX_SOURCE ...

Thanks for the discussion, Donn.  I have no major objection to anything
you said, and am heartened to hear that "posix_std_cc ansi.c", where
ansi.c is a strictly-conformant ANSI C program, will allow my ansi.c to
"#include <stdio.h>" without fdopen() getting declared as a side-effect.

I'm still not clear on just how it's going to be arranged that
	cc ansi.c
	cc old_unix.c
	cc posix.c
(where old_unix.c is existing UNIX code and posix.c avails itself of
1003.1 features) can all use the same "cc".  AT&T's __STDC__==0 mode
is apparently supposed to have something to do with this, but I don't
see how, without providing what is essentially a separate way to invoke
the compiler for at least one of the three cases.  From e-mail I
received from Dave Prosser, I have the impression that AT&T intends
for "cc" to by default be non-conforming to the ANSI C standard, and
for applications that want ANSI C conformance to have to do something
extra (perhaps add a flag to CFLAGS in their Makefiles?).  He did say
that __STDC__ is always defined as either 0 or 1 in their implementation,
with the 0 case deviating from ANSI C conformance only in adding the
"asm" keyword to the language and in adding extra (not necessarily
limited to POSIX) identifiers to the standard headers for purposes of
compatibility with previous UNIX compilation environments.  The claim
was made that that's what their customers want, but I'm an AT&T
customer too and I don't recall being polled for my opinion on this..