henry@utzoo.UUCP (Henry Spencer) (07/01/84)
The following is an informal report on what was said at the C Standards workshop at Usenix. The workshop essentially consisted of a presentation by Larry Rosler (of the ANSI C effort) plus question-and-answer afterwards. I apologize to Larry for any errors in the following. (Incidentally, he deserves a vote of thanks from everyone who attended the session. He flew in from the East Coast, at considerable inconvenience, basically just to give that talk.) The ANSI C standards effort is X3J11. It's split into three subcommittees: environment, library, and language. Rosler is chairman of the language subcommittee. The environment subcommittee is wrestling with a whole mess of very fuzzy things about how C relates to its surroundings. Alone of the three sub- committees, this one has no existing document to work from, so they're sort of feeling their way. Among the things they're trying to cope with are how a C program gets run (tentatively "main(argc, argv)", but the question of environment variables is very difficult on non-Unix systems) and how to resolve problems with European character sets. The library subcommittee is working from chapters 2 and 3 of the Unix manual. Most of chapter 2 is gone because it's Unix-dependent, although a few things like "signal" are still there. Most of chapter 3 is still present: stdio, chars and strings, memory allocation, basic math functions (nobody feels like standardizing the Bessel functions!). They are looking at things like error handling in the math library. The language subcommittee is the one all the detail following is about. Their basic goals are: - portability - preservation of the "spirit of C", i.e. the ability to get right down into the bits if you want - minimizing the impact on existing valid programs - formalizing proven enhancements (emphasis on "proven") - producing precise but readable documents The specific approach to that last item is to tidy up and tighten up the existing C Reference Manual. The idea of defining C by use of a mathematical formal definition was discussed, but it was rejected on the grounds that the audience for a definition written in English is several orders of magnitude larger. They've started from the System V.2 C Reference Manual. There have been three major areas of change in that since the "white book": 1. Long identifiers. The problem with Berklix-style arbitrary-length names is that they break existing tools and file formats. The breakage is much less severe if one simply cranks up the limit instead of making it infinite. Internal names (including pre- processor names) are now significant to 31 characters. External names are, alas, significant only to 6 characters and case is not significant in them; this cannot be improved without making the standard incompatible with most non-Unix object-module formats. 2. Void and enum. "void" is the type returned by a function that doesn't return a value. You can also cast things to "void" to throw away an unwanted value. The keyword is also used in a couple of other places, discussed later, to avoid having to introduce too many new keywords (any of which has the potential to break existing programs). Enums are as in V7; improvements to permit things like ordering comparisons (>=, etc.) on enums are still being thought about. 3. Structure/union improvements. Structure assignment, passing, and returning are as in V7. Structure comparison isn't there, at least not so far. Member names are now local to the particular structure, instead of all being in a global name space; this means that you have to be more careful about getting the type of (e.g.) the left-hand-side of "->" correct, or the compiler will object. The committee has introduced three major changes since the V.2 CRM: A. Function-argument type declaration and checking. Instead of just saying "extern int fread();", you can now say: extern int fread(char *, int, int, FILE *); so the compiler can do proper type checks. In the event of a type mismatch, the same conversions as for the assignment operator apply. (Hooray, no more casting NULL pointers!) Variable-argument functions like printf can be declared like: extern int printf(char *,); It is admitted that the comma is not all that conspicuous, and that this syntax makes it impossible to declare a function which has *only* variable arguments. These things are, of necessity, compromises. [Please note that neither Larry Rosler nor I necessarily *like* all the things I'm reporting.] There is an ambiguity when it comes to declaring no-argument functions, since "extern int rand();" looks like an old-style declaration which doesn't say anything about the arguments. The convention for this is: extern int rand(void); which means "no parameters". B. "const". A new keyword (sigh) which is used to mark things that are read-only, with run-time assignments forbidden. These things might be put in ROM or in text space. Some examples, with notes: const float pi = 3.14159; This is a real, live, named constant, which will show up in the symbol table (unlike #defines). const short yacctable[1000] = { ... }; An obvious case. const char *p; /* pointer to constant */ const *const q; /* constant pointer to something */ Illustrating two different uses: the first is a pointer that can be changed but can't be assigned through; the second is a pointer that can be assigned through but can't be changed. It is agreed that the syntax is less than ideal. Note that const is *not* a storage class, it is part of the type. extern char *strcpy(char *, const char *); Illustrating telling the compiler that strcpy doesn't change its second argument. C. Single-precision arithmetic. If all operands in an expression are float, the compiler is allowed (not required!) to evaluate it in float rather than double arithmetic. The choice is explicitly implementation-dependent. Casts can be used to force evaluation in double. Numeric constants, e.g. "1.0", are double, *not* float! This last isn't ideal, but trying to fix it invariably makes life much more complex. The original double-only rule was partly a concession to the pdp11, partly just plain simpler, but partly a way of avoiding multiple versions of all the library routines. With declarations of function argument types, the last problem is pretty much fixed. All the library functions in the standard want "full width" types, so that if you don't declare them, you're still safe. Some lesser issues: I. "Promiscuous" pointer assignments are illegal. You must use casts when mixing pointer types or mixing ints with pointers. II. "void *" is a new kind of pointer, which cannot be dereferenced but can be assigned to any other type of pointer without a cast. The idea here is that "char *" is no longer required to be the "universal" pointer type which can point to anything. So for example, the declaration of fread earlier really should go: extern int fread(void *, int, int, FILE *); (People who have machines where all pointers have the same representation, don't complain. You are lucky. Others aren't.) III. "volatile" (the choice of name is tentative) acts like "const" in the syntax, but with different semantics. It means that the data in question is "magic" in some way (e.g. device registers) and that compilers should not optimize references to such things. This resolves a long-standing problem with writing optimizing compilers for C. IV. "signal" is in the library. This means that reentrancy is explicitly part of C. V. The preprocessor is part of the language. The committee has opted for a simple and clean definition, which does not perpetuate some implementation accidents of some of the existing ones. There are some minor improvements, like permitting space before the "#". Some trivial additions: i. Hexadecimal string escapes. [Retch.] "Here's an ESC \x1b ". ii. String constant concatenation. Two string *constants* occurring adjacent to each other in the source are considered concatenated. Note that this is constants only. Among other minor things, this makes string continuation across line boundaries less ugly. iii. "unsigned char", "unsigned short", "unsigned long" are all part of the language. Plain "char" is *not* required to be signed or unsigned (requiring either would make efficient implementations impossible on some machines). The question of a "char-sized int" type, of whatever syntax, has not yet been resolved. iv. The unary + operator. Same conversions and type restrictions as unary -. Does nothing. This is partly consistency with other languages, and partly consistency with things like "atof". (At the moment, "+3.14" is valid when atoffed from a string but not when compiled into a program!) v. Initialization of unions and automatic aggregates. The latter is just removal of an existing restriction. The former is tricky; there is *no* clean way to define it. The committee has opted to do something not necessarily good, but simple: the type of the initializer is that of the lexically-first member. vi. The selection expression of a "switch" can be of any integer type. (E.g. it can be a "long".) vii. #elif. An added bit of preprocessor syntax, to simplify using #if's like a "switch". Some things are gone: 01. "entry", "asm", and "fortran" keywords. (Although the last two will probably be mentioned in a "recognized extensions" appendix.) 02. "long float" is no longer a synonym for "double". Nobody ever used it. There was discussion of using "long float" and "long double" to cope with machines having more than two floating-point types, but conversions and such are an unknown swamp in such a case, and the committee decided not to try. 03. 8 and 9 are not octal digits. 04. Pointer-integer conversions now are strictly type-checked, as I mentioned earlier. 05. The following code fragment is illegal: foo(parm) int parm; { int parm; ... Some compilers interpret such a situation as nested scopes, so the inner declaration hides the outer one. In this particular case, this seems both useless and dangerous. The scope of the arguments of a function is now identical to that of the local declarations, so this is a duplicate declaration and illegal. 06. Nothing is said about the alignment of bitfields, not even the K&R guarantee that they don't straddle word boundaries. 07. Some existing compilers permit taking the address of a variable declared "register" if the variable is not in fact placed in a register. This is now outlawed; "register" and the unary "&" operator don't mix. All in all, the current draft standard doesn't sound too bad to me. I will be getting a copy of it shortly, and may have some more comments at that time. A number of things are still unsettled. The committee's (very tentative) notion of schedule is a final draft for public comment by the end of the year, and a real standard by the end of next year. [Sound of crossing of fingers.] Comments on this should *not* be addressed to me; I'm just an interested observer, not a participant. Write to: Lawrence Rosler Supervisor, Language Systems Engineering Group AT&T Bell Laboratories Summit, NJ USA No, I don't have a network address for him. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
thomas@utah-gr.UUCP (Spencer W. Thomas) (07/03/84)
One little error I noticed: const * const p; should be char * const p; =Spencer
djmolny@wnuxb.UUCP (Molny) (07/03/84)
Wouldn't the following: extern int foo(void,); be an acceptable syntax to describe the condition of a function that has *only* variable arguments and no "non-variable" args? Ron Heiby ...!ihnp4!wnuxa!heiby
henry@utzoo.UUCP (Henry Spencer) (07/04/84)
"extern int foo(void,);" would indeed appear to suffice for a function with only variable arguments, but it would have to be another specialized idiom, since "(void)" is an idiom rather than an ordinary parameter list. I don't know whether the ANSI folks will think this worthwhile or not. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
pete@lvbull.UUCP (Pete Delaney - Rockey Mountain UNIX Consultants) (07/05/84)
Const being a sub-type may be reasonable; my first thought was that it should be a sub-storage class. I find the syntax awkward. I thing long global ID's are reasonable. Why do we need void *, the 'universital pointer', the syntax is questionable? What do K&R think of this stuff; I think they should be given substantial control as to the direction of THEIR language. Question the utility of standards commitiees. Pete Delaney
gwyn@brl-tgr.UUCP (07/06/84)
The difference is that
extern int foo();
has unknown (unspecified) arguments and anything will be permitted,
whereas
extern int foo(void,); /* suggested */
has unknown (unspecified) arguments and anything will be permitted.
There is no difference in the meaning of the DECLARATIONS, so the
question comes down to how to properly DEFINE a "varargs" function.
I do not see how
int foo(void,)
{
/* get actual parameters somehow */
}
is going to be made to work. Seems like some form of varargs needs
to be defined; does anyone know of a way to do this that will work
on all architectures and C runtime implementations?
geoff@callan.UUCP (Geoff Kuenning) (07/07/84)
Bravo, Henry Spencer, and thanks for the excellent summary! I second the motion of thanks to Larry Rosler for giving the workshop. Henry made one typo in his summary, which may confuse some people. Specifically, his example of "const" usages should read: > const char *p; /* pointer to constant */ >! char *const q; /* constant pointer to something */ > > Illustrating two different uses: the first is a pointer that > can be changed but can't be assigned through; the second is a > pointer that can be assigned through but can't be changed. It > is agreed that the syntax is less than ideal. Note that const > is *not* a storage class, it is part of the type. -- Geoff Kuenning Callan Data Systems ...!ihnp4!wlbr!callan!geoff
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/08/84)
(void *) is needed in order to have a type for things like malloc(3C). (char *) should be reserved for real pointer to char, so type-checking can be done on (char *). The (void *) syntax is unambiguous. Although Brian Kernighan helped write the C book, the language was designed by Dennis Ritchie. Some of the more modern improvements seem to have arisen from discussions with others, notably Steve Johnson. One indication of how Dennis Ritchie feels about the ANSI standardization effort is that he specially urged Larry Rossler to come to the USENIX conference to describe the effort and sat on stage during the presentation. Few people who have been writing production code on a variety of systems will dispute the utility of good language standards.
thomas@utah-gr.UUCP (Spencer W. Thomas) (07/09/84)
I think, Doug, you are missing the point here. The form int foo(void,); is the EXTERNAL declaration form. The form of declaration at the point of definition is not being changed. =Spencer
henry@utzoo.UUCP (Henry Spencer) (07/10/84)
In reply to some comments from Pete Delaney... Const being a sub-type may be reasonable; my first thought was that it should be a sub-storage class. I find the syntax awkward. Making const a storage class strikes everyone as the obvious thing to do at first. It has problems in that it really does need to be a sub-class, since "static" and "extern" are still reasonable modifiers even for const data. It also greatly limits the versatility of const -- most of the examples I gave in my summary were things you couldn't write if const were a storage class. My own personal view is that the real, crying need is for a way to say "put this in read-only memory", and a (sub-)storage class probably would have sufficed for that, but I have nothing serious against the more sophisticated facility. It does have its advantages. And you aren't the only one who doesn't like the syntax! I haven't got any decisively better ideas, though. I thing long global ID's are reasonable. Me too. But I don't see any way to compel the whole world to conform to this belief, and until they do, the problem isn't fixable. Why do we need void *, the 'universital pointer', the syntax is questionable? The use of "void *" as the universal-pointer syntax is a blatant concession to not wanting to introduce unnecessary new keywords, for fear of breaking too many old programs. It's distasteful but bearable. The need for the universal pointer is mostly in connection with storage management: what type does "malloc" return? Making "char *" the universal pointer does cause problems, not least among them the inefficiency of handling "char *" on some architectures. And there's no way to shut lint up about malloc, either, because lint has no way to know that the "char *" which malloc is returning is acceptable for casting to other types, unlike some "char *"s. What do K&R think of this stuff; I think they should be given substantial control as to the direction of THEIR language. Dennis Ritchie is well aware of what the ANSI folks are doing. They consult him with some frequency. He was at the Usenix session, and commented on a few things ("enums are a botch"). My impression is that he's cautiously in favor of most of what they are doing, with reservations about a few problems. [Note that this is my impression only.] Question the utility of standards commitiees. Standards are a practical necessity, I'm afraid; the language has gone far beyond the point where Dennis could maintain personal control over it, even if he really wanted the headaches that would ensue. And the politics of standards development require committees, alas. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
geoff@callan.UUCP (07/11/84)
>What do K&R think of this stuff; I think they should be given substantial >control as to the direction of THEIR language. > Pete Delaney I don't know about K, but Dennis Ritchie was sitting on the podium during the standards workshop. Although Rosler said about several features that Dennis didn't necessarily approve, Ritchie only appeared truly grossed out once (sorry, I don't remember about what). As to its being THEIR language: sorry, I don't agree. THEIR language is the C compiler from version 6 or before; the current C language belongs to the user community that needs it. Would you want Grace Hopper to be the only person allowed to propose changes in COBOL? I don't think she even uses the language any more, yet it is a living and breathing entity (okay, gasping). -- Geoff Kuenning Callan Data Systems ...!ihnp4!wlbr!callan!geoff
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/11/84)
My point was that the declaration extern int foo(void,); contains no more information than the proposed meaning of extern int foo(); namely that the number and types of the arguments are not specified.
ka@hou3c.UUCP (Kenneth Almquist) (07/12/84)
It would seem that "extern int foo()" would be the best way to declare a function with only variable arguments, although doing this would prevent the same syntax from being used at a later date to declare a function with no arguements. Kenneth Almquist
laura@utzoo.UUCP (Laura Creighton) (07/14/84)
I have a real problem with this statement by Geoff Kuenning: As to its being THEIR language: sorry, I don't agree. THEIR language is the C compiler from version 6 or before; the current C language belongs to the user community that needs it. Would you want Grace Hopper to be the only person allowed to propose changes in COBOL? I don't think she even uses the language any more, yet it is a living and breathing entity (okay, gasping). First of all, C is not a public domain product. If you have a C compiler you either have written your own (in which case it is yours) or you have bought it from somebody (in which case it is theirs). All the need in the world doesn't amount to a hill of beans. We can come to the conclusion that the C standards committee is doing a good thing, and we can all adopt it, making it a bad business practice to not adopt it, but AT&T and anybody else producing C compilers can be stupid and ignore the standard, *because THEY and not the community OWN the language*. If the C standards committee was doing a really lousy job, I would be really pleased if Dennis Ritchie was the only person who could make changes to the official language. (Propose changes, no. Make them official -- yes). Of course, Dennis Ritchie might have better things to do with his time. If you ever invent a good thing which is good for reasons beyond ``well, it compiles and does the job'' -- for instance if it is elegant, you run a terrible risk whenever you release it to the world at large. A lot of people don't know what ``elegant'' means. About 2 months ago I got a piece of code mailed back to me. Somebody claimed that it was a crock and asked me to fix my trash. Well, I looked at it. It took me a while to recognise it. Four years ago it had been a page and a half of assembler which did one thing well. Today it is >14 pages of assembler which does 5 new things badly and no longer does what I wrote it to do at all. Yet my name is still on the top. I suppose I could go the Peter Langston route (when is there going to be Empire for the 68000, Peter?) and not release source. Maybe I should put a disclaimer in: ``anybody caught brutally hacking this code will have the dubious pleasure of being visited by the source code Mafia and have every finger broken before being beaten up with the clubs with the sharp spikes!'' I know people who put in a notice saying that you must document every change that you make to any code or that you must delete the author's name after making any changes. The second approach seems like giving your effort away to the barbarians. All of this becomes more difficult when you are trying to *sell* your software, as opposed to give it away as public domain stuff. There are some horrible things out there which are called ``unix'' and ``unix-like''. I suspect that if anything that called itself ``unix'' had to have the Ken Thompson seal of approval there would be fewer of these. Of course, Ken Thompson has probably got better things to do with his time as well. Laura Creighton utzoo!laura
djmolny@wnuxb.UUCP (Molny) (07/17/84)
> From: gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) > Date: Wed, 11-Jul-84 15:11:24 EDT > My point was that the declaration > extern int foo(void,); > contains no more information than the proposed meaning of > extern int foo(); > namely that the number and types of the arguments are not specified. No, Doug. My intent was that the former declaration would declare a function that has a variable parameter list of zero or more items. The latter declaration would still declare a function that had an unspecified parameter list. The difference is between "unspecified" and "specified to be variable (>=0 params)". -- ________________ | __________ | from the overlapping windows | | ksh! | | of | |__________ | | | gmacs! | | Ronald W. Heiby | | _________ | AT&T Technologies, Inc. | | |dstar! | | Lisle, IL (CU-D21) | | | | | __ ...!ihnp4!wnuxa!heiby | --| | |_/ \_____ | --------- | /\ \_ | | \/ \+++ |TTY_______5620| / \ ---------------- (red) \___/
chongo@nsc.UUCP (Landon C. Noll) (07/20/84)
>Although Rosler said about several features that Dennis didn't >necessarily approve, Ritchie only appeared truly grossed out >once (sorry, I don't remember about what). i seem to recall he objected to the idea of leading + such as: foo = +5; any ideas why? :-) chongo <foo+=+5;> \/++\/
pete@lvbull.UUCP (Pete Delaney - Rocky Mountain UNIX Consultants) (07/21/84)
How about, extern void printf(varargs); to specify a function with a variable number of args.