[net.unix-wizards] summary of C-standards workshop at Usenix

henry@utzoo.UUCP (Henry Spencer) (07/01/84)

The following is an informal report on what was said at the C Standards
workshop at Usenix.  The workshop essentially consisted of a presentation
by Larry Rosler (of the ANSI C effort) plus question-and-answer afterwards.
I apologize to Larry for any errors in the following.  (Incidentally, he
deserves a vote of thanks from everyone who attended the session.  He flew
in from the East Coast, at considerable inconvenience, basically just to
give that talk.)

The ANSI C standards effort is X3J11.  It's split into three subcommittees:
environment, library, and language.  Rosler is chairman of the language
subcommittee.

The environment subcommittee is wrestling with a whole mess of very fuzzy
things about how C relates to its surroundings.  Alone of the three sub-
committees, this one has no existing document to work from, so they're sort
of feeling their way.  Among the things they're trying to cope with are
how a C program gets run (tentatively "main(argc, argv)", but the question
of environment variables is very difficult on non-Unix systems) and how to
resolve problems with European character sets.

The library subcommittee is working from chapters 2 and 3 of the Unix
manual.  Most of chapter 2 is gone because it's Unix-dependent, although
a few things like "signal" are still there.  Most of chapter 3 is still
present:  stdio, chars and strings, memory allocation, basic math functions
(nobody feels like standardizing the Bessel functions!).  They are looking
at things like error handling in the math library.

The language subcommittee is the one all the detail following is about.

Their basic goals are:
	- portability
	- preservation of the "spirit of C", i.e. the ability to get
		right down into the bits if you want
	- minimizing the impact on existing valid programs
	- formalizing proven enhancements (emphasis on "proven")
	- producing precise but readable documents

The specific approach to that last item is to tidy up and tighten up
the existing C Reference Manual.  The idea of defining C by use of a
mathematical formal definition was discussed, but it was rejected on
the grounds that the audience for a definition written in English is
several orders of magnitude larger.

They've started from the System V.2 C Reference Manual.  There have been
three major areas of change in that since the "white book":

1. Long identifiers.  The problem with Berklix-style arbitrary-length
	names is that they break existing tools and file formats.  The
	breakage is much less severe if one simply cranks up the limit
	instead of making it infinite.  Internal names (including pre-
	processor names) are now significant to 31 characters.  External
	names are, alas, significant only to 6 characters and case is
	not significant in them; this cannot be improved without making
	the standard incompatible with most non-Unix object-module formats.

2. Void and enum.  "void" is the type returned by a function that doesn't
	return a value.  You can also cast things to "void" to throw away
	an unwanted value.  The keyword is also used in a couple of other
	places, discussed later, to avoid having to introduce too many
	new keywords (any of which has the potential to break existing
	programs).  Enums are as in V7; improvements to permit things
	like ordering comparisons (>=, etc.) on enums are still being
	thought about.

3. Structure/union improvements.  Structure assignment, passing, and
	returning are as in V7.  Structure comparison isn't there, at
	least not so far.  Member names are now local to the
	particular structure, instead of all being in a global name
	space; this means that you have to be more careful about getting
	the type of (e.g.) the left-hand-side of "->" correct, or the
	compiler will object.

The committee has introduced three major changes since the V.2 CRM:

A. Function-argument type declaration and checking.  Instead of just
	saying "extern int fread();", you can now say:

		extern int fread(char *, int, int, FILE *);

	so the compiler can do proper type checks.  In the event of
	a type mismatch, the same conversions as for the assignment
	operator apply.  (Hooray, no more casting NULL pointers!)
	Variable-argument functions like printf can be declared like:

		extern int printf(char *,);

	It is admitted that the comma is not all that conspicuous,
	and that this syntax makes it impossible to declare a function
	which has *only* variable arguments.  These things are, of
	necessity, compromises.  [Please note that neither Larry Rosler
	nor I necessarily *like* all the things I'm reporting.]  There
	is an ambiguity when it comes to declaring no-argument functions,
	since "extern int rand();" looks like an old-style declaration
	which doesn't say anything about the arguments.  The convention
	for this is:

		extern int rand(void);

	which means "no parameters".

B. "const".  A new keyword (sigh) which is used to mark things that are
	read-only, with run-time assignments forbidden.  These things
	might be put in ROM or in text space.  Some examples, with notes:

		const float pi = 3.14159;

	This is a real, live, named constant, which will show up in the
	symbol table (unlike #defines).

		const short yacctable[1000] = { ... };

	An obvious case.

		const char *p;		/* pointer to constant */
		const *const q;		/* constant pointer to something */

	Illustrating two different uses:  the first is a pointer that
	can be changed but can't be assigned through; the second is a
	pointer that can be assigned through but can't be changed.  It
	is agreed that the syntax is less than ideal.  Note that const
	is *not* a storage class, it is part of the type.

		extern char *strcpy(char *, const char *);

	Illustrating telling the compiler that strcpy doesn't change
	its second argument.

C. Single-precision arithmetic.  If all operands in an expression are
	float, the compiler is allowed (not required!) to evaluate it in
	float rather than double arithmetic.  The choice is explicitly
	implementation-dependent.  Casts can be used to force evaluation
	in double.  Numeric constants, e.g. "1.0", are double, *not* float!
	This last isn't ideal, but trying to fix it invariably makes life
	much more complex.

	The original double-only rule was partly a concession to the
	pdp11, partly just plain simpler, but partly a way of avoiding
	multiple versions of all the library routines.  With declarations
	of function argument types, the last problem is pretty much fixed.
	All the library functions in the standard want "full width"
	types, so that if you don't declare them, you're still safe.

Some lesser issues:

I. "Promiscuous" pointer assignments are illegal.  You must use casts
	when mixing pointer types or mixing ints with pointers.

II.  "void *" is a new kind of pointer, which cannot be dereferenced but
	can be assigned to any other type of pointer without a cast.  The
	idea here is that "char *" is no longer required to be the
	"universal" pointer type which can point to anything.  So for
	example, the declaration of fread earlier really should go:

		extern int fread(void *, int, int, FILE *);

	(People who have machines where all pointers have the same
	representation, don't complain.  You are lucky.  Others aren't.)

III.  "volatile" (the choice of name is tentative) acts like "const"
	in the syntax, but with different semantics.  It means that the
	data in question is "magic" in some way (e.g. device registers)
	and that compilers should not optimize references to such things.
	This resolves a long-standing problem with writing optimizing
	compilers for C.

IV.  "signal" is in the library.  This means that reentrancy is explicitly
	part of C.

V.  The preprocessor is part of the language.  The committee has opted
	for a simple and clean definition, which does not perpetuate some
	implementation accidents of some of the existing ones.  There are
	some minor improvements, like permitting space before the "#".

Some trivial additions:

i.  Hexadecimal string escapes.  [Retch.]  "Here's an ESC \x1b ".

ii.  String constant concatenation.  Two string *constants* occurring
	adjacent to each other in the source are considered concatenated.
	Note that this is constants only.  Among other minor things, this
	makes string continuation across line boundaries less ugly.

iii. "unsigned char", "unsigned short", "unsigned long" are all part of
	the language.  Plain "char" is *not* required to be signed or
	unsigned (requiring either would make efficient implementations
	impossible on some machines).  The question of a "char-sized int"
	type, of whatever syntax, has not yet been resolved.

iv.  The unary + operator.  Same conversions and type restrictions as
	unary -.  Does nothing.  This is partly consistency with other
	languages, and partly consistency with things like "atof".  (At
	the moment, "+3.14" is valid when atoffed from a string but not
	when compiled into a program!)

v. Initialization of unions and automatic aggregates.  The latter is
	just removal of an existing restriction.  The former is tricky;
	there is *no* clean way to define it.  The committee has opted
	to do something not necessarily good, but simple:  the type of
	the initializer is that of the lexically-first member.

vi. The selection expression of a "switch" can be of any integer type.
	(E.g. it can be a "long".)

vii.  #elif.  An added bit of preprocessor syntax, to simplify using
	#if's like a "switch".

Some things are gone:

01. "entry", "asm", and "fortran" keywords.  (Although the last two
	will probably be mentioned in a "recognized extensions" appendix.)

02. "long float" is no longer a synonym for "double".  Nobody ever used
	it.  There was discussion of using "long float" and "long double"
	to cope with machines having more than two floating-point types,
	but conversions and such are an unknown swamp in such a case, and
	the committee decided not to try.

03. 8 and 9 are not octal digits.

04. Pointer-integer conversions now are strictly type-checked, as I
	mentioned earlier.

05. The following code fragment is illegal:

		foo(parm)
		int parm;
		{
			int parm;
			...

	Some compilers interpret such a situation as nested scopes, so
	the inner declaration hides the outer one.  In this particular
	case, this seems both useless and dangerous.  The scope of the
	arguments of a function is now identical to that of the local
	declarations, so this is a duplicate declaration and illegal.

06. Nothing is said about the alignment of bitfields, not even the
	K&R guarantee that they don't straddle word boundaries.

07. Some existing compilers permit taking the address of a variable
	declared "register" if the variable is not in fact placed in
	a register.  This is now outlawed; "register" and the unary
	"&" operator don't mix.

All in all, the current draft standard doesn't sound too bad to me.
I will be getting a copy of it shortly, and may have some more comments
at that time.  A number of things are still unsettled.  The committee's
(very tentative) notion of schedule is a final draft for public comment
by the end of the year, and a real standard by the end of next year.
[Sound of crossing of fingers.]

Comments on this should *not* be addressed to me; I'm just an interested
observer, not a participant.  Write to:

	Lawrence Rosler
	Supervisor, Language Systems Engineering Group
	AT&T Bell Laboratories
	Summit, NJ  USA

No, I don't have a network address for him.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

thomas@utah-gr.UUCP (Spencer W. Thomas) (07/03/84)

One little error I noticed:
	const * const p;
should be
	char * const p;

=Spencer

djmolny@wnuxb.UUCP (Molny) (07/03/84)

Wouldn't the following:
     extern int foo(void,);
be an acceptable syntax to describe the condition of a function
that has *only* variable arguments and no "non-variable" args?
Ron Heiby  ...!ihnp4!wnuxa!heiby

henry@utzoo.UUCP (Henry Spencer) (07/04/84)

"extern int foo(void,);" would indeed appear to suffice for a function
with only variable arguments, but it would have to be another specialized
idiom, since "(void)" is an idiom rather than an ordinary parameter list.
I don't know whether the ANSI folks will think this worthwhile or not.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

pete@lvbull.UUCP (Pete Delaney - Rockey Mountain UNIX Consultants) (07/05/84)

Const being a sub-type may be reasonable; my first thought was that
it should be a sub-storage class.  I find the syntax awkward.

I thing long global ID's are reasonable.

Why do we need void *, the 'universital pointer', the syntax is questionable?

What do K&R think of this stuff; I think they should be given substantial
control as to the direction of THEIR language.

Question the utility of standards commitiees.

					Pete Delaney

gwyn@brl-tgr.UUCP (07/06/84)

The difference is that
	extern int foo();
has unknown (unspecified) arguments and anything will be permitted,
whereas
	extern int foo(void,);		/* suggested */
has unknown (unspecified) arguments and anything will be permitted.
There is no difference in the meaning of the DECLARATIONS, so the
question comes down to how to properly DEFINE a "varargs" function.
I do not see how
	int foo(void,)
		{
		/* get actual parameters somehow */
		}
is going to be made to work.  Seems like some form of varargs needs
to be defined; does anyone know of a way to do this that will work
on all architectures and C runtime implementations?

geoff@callan.UUCP (Geoff Kuenning) (07/07/84)

Bravo, Henry Spencer, and thanks for the excellent summary!  I second the
motion of thanks to Larry Rosler for giving the workshop.

Henry made one typo in his summary, which may confuse some people.
Specifically, his example of "const" usages should read:

>		const char *p;		/* pointer to constant */
>!		char *const q;		/* constant pointer to something */
>
>	Illustrating two different uses:  the first is a pointer that
>	can be changed but can't be assigned through; the second is a
>	pointer that can be assigned through but can't be changed.  It
>	is agreed that the syntax is less than ideal.  Note that const
>	is *not* a storage class, it is part of the type.
-- 

	Geoff Kuenning
	Callan Data Systems
	...!ihnp4!wlbr!callan!geoff

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/08/84)

(void *) is needed in order to have a type for things like malloc(3C).
(char *) should be reserved for real pointer to char, so type-checking
can be done on (char *).  The (void *) syntax is unambiguous.

Although Brian Kernighan helped write the C book, the language was
designed by Dennis Ritchie.  Some of the more modern improvements
seem to have arisen from discussions with others, notably Steve
Johnson.  One indication of how Dennis Ritchie feels about the ANSI
standardization effort is that he specially urged Larry Rossler to
come to the USENIX conference to describe the effort and sat on
stage during the presentation.

Few people who have been writing production code on a variety of
systems will dispute the utility of good language standards.

thomas@utah-gr.UUCP (Spencer W. Thomas) (07/09/84)

I think, Doug, you are missing the point here.  The form

int foo(void,);

is the EXTERNAL declaration form.  The form of declaration at the point of
definition is not being changed.

=Spencer

henry@utzoo.UUCP (Henry Spencer) (07/10/84)

In reply to some comments from Pete Delaney...

    Const being a sub-type may be reasonable; my first thought was that
    it should be a sub-storage class.  I find the syntax awkward.

Making const a storage class strikes everyone as the obvious thing to do
at first.  It has problems in that it really does need to be a sub-class,
since "static" and "extern" are still reasonable modifiers even for const
data.  It also greatly limits the versatility of const -- most of the
examples I gave in my summary were things you couldn't write if const were
a storage class.  My own personal view is that the real, crying need is
for a way to say "put this in read-only memory", and a (sub-)storage class
probably would have sufficed for that, but I have nothing serious against
the more sophisticated facility.  It does have its advantages.

And you aren't the only one who doesn't like the syntax!  I haven't got any
decisively better ideas, though.

    I thing long global ID's are reasonable.

Me too.  But I don't see any way to compel the whole world to conform to
this belief, and until they do, the problem isn't fixable.

    Why do we need void *, the 'universital pointer', the syntax is
    questionable?

The use of "void *" as the universal-pointer syntax is a blatant concession
to not wanting to introduce unnecessary new keywords, for fear of breaking
too many old programs.  It's distasteful but bearable.  The need for the
universal pointer is mostly in connection with storage management:  what
type does "malloc" return?  Making "char *" the universal pointer does
cause problems, not least among them the inefficiency of handling "char *"
on some architectures.  And there's no way to shut lint up about malloc,
either, because lint has no way to know that the "char *" which malloc is
returning is acceptable for casting to other types, unlike some "char *"s.

    What do K&R think of this stuff; I think they should be given substantial
    control as to the direction of THEIR language.

Dennis Ritchie is well aware of what the ANSI folks are doing.  They consult
him with some frequency.  He was at the Usenix session, and commented on
a few things ("enums are a botch").  My impression is that he's cautiously
in favor of most of what they are doing, with reservations about a few
problems.  [Note that this is my impression only.]

    Question the utility of standards commitiees.

Standards are a practical necessity, I'm afraid; the language has gone far
beyond the point where Dennis could maintain personal control over it, even
if he really wanted the headaches that would ensue.  And the politics of
standards development require committees, alas.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

geoff@callan.UUCP (07/11/84)

>What do K&R think of this stuff; I think they should be given substantial
>control as to the direction of THEIR language.
>					Pete Delaney

I don't know about K, but Dennis Ritchie was sitting on the podium during
the standards workshop.  Although Rosler said about several features that
Dennis didn't necessarily approve, Ritchie only appeared truly grossed out
once (sorry, I don't remember about what).

As to its being THEIR language:  sorry, I don't agree.  THEIR language is the
C compiler from version 6 or before;  the current C language belongs to the
user community that needs it.  Would you want Grace Hopper to be the only
person allowed to propose changes in COBOL?  I don't think she even uses the
language any more, yet it is a living and breathing entity (okay, gasping).
-- 

	Geoff Kuenning
	Callan Data Systems
	...!ihnp4!wlbr!callan!geoff

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/11/84)

My point was that the declaration
	extern int foo(void,);
contains no more information than the proposed meaning of
	extern int foo();
namely that the number and types of the arguments are not specified.

ka@hou3c.UUCP (Kenneth Almquist) (07/12/84)

It would seem that "extern int foo()" would be the best way to declare
a function with only variable arguments, although doing this would
prevent the same syntax from being used at a later date to declare
a function with no arguements.
					Kenneth Almquist

laura@utzoo.UUCP (Laura Creighton) (07/14/84)

	
I have a real problem with this statement by Geoff Kuenning:

	As to its being THEIR language:  sorry, I don't agree.  THEIR language
	is the C compiler from version 6 or before;  the current C language
	belongs to the 	user community that needs it.  Would you want
	Grace Hopper to be the only person allowed to propose changes in
	COBOL?  I don't think she even uses the language
	any more, yet it is a living and breathing entity (okay, gasping).

First of all, C is not a public domain product. If you have a C compiler
you either have written your own (in which case it is yours) or you have
bought it from somebody (in which case it is theirs). All the need in the
world doesn't amount to a hill of beans.

We can come to the conclusion that the C standards committee is doing a
good thing, and we can all adopt it, making it a bad business practice
to not adopt it, but AT&T and anybody else producing C compilers can be
stupid and ignore the standard, *because THEY and not the community OWN
the language*.

If the C standards committee was doing a really lousy job, I would be
really pleased if Dennis Ritchie was the only person who could make
changes to the official language. (Propose changes, no. Make them
official -- yes). Of course, Dennis Ritchie might have better things to
do with his time. 

If you ever invent a good thing which is good for reasons beyond 
``well, it compiles and does the job'' -- for instance if it is elegant,
you run a terrible risk whenever you release it to the world at large.
A lot of people don't know what ``elegant'' means. About 2 months ago
I got a piece of code mailed back to me. Somebody claimed that it
was a crock and asked me to fix my trash.

Well, I looked at it. It took me a while to recognise it. Four years ago
it had been a page and a half of assembler which did one thing well.
Today it is >14 pages of assembler which does 5 new things badly and
no longer does what I wrote it to do at all.

Yet my name is still on the top.

I suppose I could go the Peter Langston route (when is there going
to be Empire for the 68000, Peter?) and not release source. Maybe
I should put a disclaimer in:

	``anybody caught brutally hacking this code will have the
	dubious pleasure of being visited by the source code Mafia
	and have every finger broken before being beaten up with
	the clubs with the sharp spikes!''

I know people who put in a notice saying that you must document every
change that you make to any code or that you must delete the author's
name after making any changes. The second approach seems like giving
your effort away to the barbarians. All of this becomes more difficult
when you are trying to *sell* your software, as opposed to give it
away as public domain stuff. There are some horrible things out there
which are called ``unix'' and ``unix-like''. I suspect that if anything
that called itself ``unix'' had to have the Ken Thompson seal of approval
there would be fewer of these. Of course, Ken Thompson has probably got
better things to do with his time as well.

Laura Creighton
utzoo!laura

djmolny@wnuxb.UUCP (Molny) (07/17/84)

>  From: gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>)
>  Date: Wed, 11-Jul-84 15:11:24 EDT

>  My point was that the declaration
>  	extern int foo(void,);
>  contains no more information than the proposed meaning of
>  	extern int foo();
>  namely that the number and types of the arguments are not specified.

No, Doug.  My intent was that the former declaration would declare
a function that has a variable parameter list of zero or more items.
The latter declaration would still declare a function that had an
unspecified parameter list.  The difference is between "unspecified"
and "specified to be variable (>=0 params)".
-- 
	________________
	| __________   |	from the overlapping windows
	| | ksh!   |   |		of
	| |__________  |
	|  | gmacs! |  |	Ronald W. Heiby
	|  | _________ |	AT&T Technologies, Inc.
	|  | |dstar! | |	Lisle, IL  (CU-D21)
	|  | |       | |  __	...!ihnp4!wnuxa!heiby
	|  --|       | |_/  \_____
	|    --------- |    /\    \_
	|              |    \/      \+++
	|TTY_______5620|            /   \
	----------------            (red)
                                    \___/

chongo@nsc.UUCP (Landon C. Noll) (07/20/84)

	>Although Rosler said about several features that Dennis didn't
	>necessarily approve, Ritchie only appeared truly grossed out
	>once (sorry, I don't remember about what).

i seem to recall he objected to the idea of leading + such as:

	foo = +5;

any ideas why? :-)

chongo <foo+=+5;> \/++\/

pete@lvbull.UUCP (Pete Delaney - Rocky Mountain UNIX Consultants) (07/21/84)

How about,

extern void printf(varargs);

to specify a function with a variable number of args.