[net.unix-wizards] C extensions

mat (10/19/82)

about "XYZ" "ABC" being identical to "XYZABC", THIS IS DANGEROUS.  A
dropped comma in an argument list can cause the list to be fouled up.
If this is something like an error-handler ( where his example would
be useful ) which is not exercised, and which may be marked as VARARGS
for lint, it may get past debugging, and become a 'sleeper' bug, waiting
to foul up a Venus probe 10 years from now.

bcw (10/20/82)

From:	Bruce C. Wright @ Duke University
Re:	C extensions

Well, there are certainly languages which allow both ordered and unordered
enumerated types - ada is one.

			Bruce C. Wright @ Duke University

sher (10/21/82)

this probably garbage sorry

gwyn@Brl@sri-unix (10/22/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     19 Oct 82 12:32:48-EDT (Tue)
There is no "else" statement ambiguity; the else is bound to the most
recent un-elsed "if".  To change this behavior would result in an
incompatible language which should most certainly not be called "C".

In Ada and Algol-68 (I think), there is no need for "begin...end" or
"{...}" compound statement delimiters, due to the structure of the
language.  E.g.,
	if condition then st1; st2 else st3; st4 fi
When properly indented,
	if condition then
		st1;
		st2
	else
		st3;
		st4
	fi
it is possible to change the number of statements in the body without
worrying about begin...end brackets.  However, in the Pascal-like form
I've used here, the trailing semicolons come and go when one changes
the number of statements.  This nuisance has been fixed in Ada, where
; is a statement terminator (as in C) rather than a delimiter.

This is not an advertisement for Ada, which I think is overblown.

Michael.Young@Cmu-10a@sri-unix (10/22/82)

From: Michael Wayne Young <Michael.Young@Cmu-10a>
Date: 19 October 1982 1409-EDT (Tuesday)
How about this one: a 'constant' type class.  [Is this what you
meant by 'classc'?  If not, what did you mean?]

I also would like
to see some true incorporation of the proprocessor into the language --
who uses one without the other?  [I guess some people use m4 to
preprocess their source, but then this doesn't prevent cpp from being
used too.]  If we could get the preprocessor merged with the compiler,
I'd like to be able to build "packages" (to borrow an abused Ada
term) consisting of external and/or macro definitions.
Instead of using "include" files which just contain information that
gets built into symbol tables (in cpp or ccom/c0,c1), we could
have the compiler SAVE these things, and include the compiled
tables instead.  This would cut down on a lot of "include" file
scanning, and shouldn't be too tough to do; however, it does
mean a different compiler organization.

Another interesting question I have is: why can't union types
be initialized  That is, as long as your initializers make the
choice of the unioned types unambiguous, what's the harm
I recognize that it's not a really good programming practice,
but being able to initialize unions does make them more useful,
and I've seen lots of code that just chose to use (very non-portable)
alternatives to unions for similar reasons.

And then how about a language-defined variable-length argument list
construct.  I'm not sure how I'd like to see this done, but have
several interesting thoughts on the matter.  This unfortunately
requires that the implementation of routine-call code be
restricted a little bit, but most implementations of C use the
PDP-11/Vax style of parameter-passing already, and that's what
we want to see.

Along the lines of in-block variable declarations, I'd be
interested in seeing a "retire variable;" or "register variable"
statement set that could suggest to the compiler that a variable
can be thrown away at this point (lexically, in the source),
or that at this point, it'd be wise to make the variable given
a register variable.  Again, this is a hack, but it's nice
to be able to have a whole bunch of register-class variables
(with differing names, for clarity of code) but not have them
all active at once.  [Perhaps a backup in 'automatic' space
would be appropriate.]  Also, much of this facility can already
be provided by a "smart" compiler that chooses its use of
registers carefully, but often the programmer knows better
what his register-class variables ought to be.

About string operations: I in fact would suggest NOT adding them
to the language for the same reason that I don't want to see
multi-precision arithmetic added... packages of subroutines are
good for such things.  If you really insist, what I'd suggest is
to add a construct for binding operation symbols to routines,
somewhat like the Ada overloading technique.  Compilers shouldn't
have the added responsibility for supporting structured, or complex,
types -- for one thing, they fix the implementation, and for another,
I would rather have my compiler not have to worry about it.

In the last year or so, I've often thought of neat things that could
be added to the language, but I've restrained saying anything
about them, for the primary reason that C is a very clean-looking,
simple, and yet portable, language.  [One reason I don't suggest
very large changes is that lots of sites still run V6 and V7 Unix,
and the V6 compilers, as we all know, are very archaic, but much
of the currently-deveoped C code still compiles there.  We don't
want to pull a change (like from 4.1 ==> 4.2 BSD) that loses
half of our audience.]

			Michael

trb (10/22/82)

Someone asked for C extension suggestions...
How about a # directive to tell the loader that some object should
be included (or object library be searched)?

#search <libm.a>
...
	sin(x);

Analogous to the #include feature, but #include is necessary because
you need the exact positioning, whereas #search would just be a loader
directive.  I guess I have an unhealthy aversion to the -l's on the cc
command line.  Please don't provoke me.

DEC TOPS-10 MACRO-10 had such a feature, and I remember it being useful.

	Andy Tannenbaum   Bell Labs  Whippany, NJ   (201) 386-6491

hamilton (10/24/82)

#R:unc:-409800:uicsovax:5500038:000:365
uicsovax!hamilton    Oct 24 14:31:00 1982

now that C supports some structure operations (assignment, passing to & from
functions), how about a structure (or array) constant?  something like:

	struct foo { char a; int b; float c; } afoo;
	...
	afoo = (struct foo) { 'b', 5, 3.14 };

(choice of parenthetic characters is admittedly a problem).
	wayne ({decvax,ucbvax,harpo}!pur-ee!uiucdcs!uicsovax!)hamilton

dvk (10/25/82)

I'll go along with Anndy Tannennbaumm and opt for a library include in the
C program. However, I propose what CMU had/has in their C system, namely
you tell it what library file you want to have, via:

#library <termlib>

Which tells the linker to do a -libtermlib for you.

	-Dan Klein, Mellon Institute, Pittburgh

mat (10/25/82)

One of my favorite ideas for extending C is a preprocessor directive to
generate messages to the error output, and another to abort the processing
with an error exit by cpp.  These would be helpful in parameretizations where
one wants to protect against conflicting of otherwise 'bad' settings
of #defines.

gwyn@Brl@sri-unix (10/26/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     22 Oct 82 18:16:35-EDT (Fri)
Some of the things you would like added to C are already possible.

/usr/include/varargs.h implements variable-argument list support,
although I wouldn't recommend its use for much more than i/o lists.
It is a bit tricky to implement varargs on some architectures
(sometimes registers are used for some arguments and memory for
the others) but as far as I know it can always be done.

There are two approaches to "retiring" variables when they are no
longer needed (most useful for register variables).  Often the
code can be modularized so that no procedure uses more than the
maximum number of registers; local variables are hidden from other
sections of the code, of course.  The other method is something that
I use a lot but that some people hate: one can declare "local"
variables at the start of any compound statement block (i.e., after
a `{'); they "vanish" when the corresponding `}' is reached.
Although local variables preempt any of the same name from the
block's environment, good programming practice requires one to
use distinct names (although using the same name in disjoint
local symbol blocks, such as "i" for loop indices, doesn't
pose a problem).

One can easily implement "packages" as separately-compiled
source files; declare all non-globally-known extern names
"static".  I also do this quite a bit and it works beautifully.

Many of the changes to C since 6th Edition UNIX were made to
support portable coding (e.g., type-casts, unions), so it is
not advisable to continue to use the 6th Edition compiler.
With a moderate effort, one can bring up the System III C
compiler and library on a 6th Edition kernel.  I did this at
Geotronics; maybe someone there would be willing to export the
package to save others having to reinvent the wheel.

As for incorporating the preprocessor in the compiler, apart from
the desire to compile "modules" (see above), this is an implementation
decision.  In fact Whitesmiths' C preprocessor can generate either
C code with # lines resolved, or lexemes for pass one of the compiler
proper.  I think this is a nice idea but it has no direct effect on
the language user.

I am mystified by the "constant type class".  Perhaps you mean
read-only (therefore initialized!) data?  The preprocessor provides
already the Pascal style of "constant" via #define.

I can't think of a clean general way to specify union initializers.
A portable way to accomplish this would be to execute an init_data()
function first thing in main().  Or, in a "package" you could have an
"initialized" flag tested on each entry and initialization done if
not already.  Admittedly this adds run-time overhead; however, you
would have to do this in Pascal anyway and in C it is only necessary
for unions.  I have yet to need initialized unions, although I am
sure an example could be cooked up.

I am glad to see some discussion of C improvements.  For the most
part, it is a very nice, clean language.  There seem to be very few
possibilities for (compatible!) enhancements beyond existing
capabilities.

mullen@Nrl-Css@sri-unix (10/27/82)

From: mullen at Nrl-Css (Preston Mullen)
Date: 24 Oct 1982 23:43:21-EDT
From: harpo!ihps3!houxi!hou5d!hou5a!mat at Ucb-C70
	Subject: Re: C extensions
	Article-I.D.: hou5a.161
	Remailed-date: 23 Oct 1982 1913-PDT
	
	about "XYZ" "ABC" being identical to "XYZABC", THIS IS DANGEROUS.  A
	dropped comma in an argument list can cause the list to be fouled up.
	If this is something like an error-handler ( where his example would
	be useful ) which is not exercised, and which may be marked as VARARGS
	for lint, it may get past debugging, and become a 'sleeper' bug, waiting
	to foul up a Venus probe 10 years from now.
	
Poo.  What's dangerous is inadequate validation.  Only an idiot
would put an unvalidated error handler (or an unvalidated anything)
into a Venus probe!  Such alarmist hyperbole does not enhance an argument.

[The following bit of ribaldry is not altogether inappropriate:
Q:	"Why is a herpes virus like a dropped comma?"
A:	"It may become a 'sleeper' bug, waiting to foul up a
	 Venus probe 10 years from now."
Q:	"What is the difference?"
A:	"A dropped comma can always be found and cured before
	 it hurts anybody."]

The proposed construct is hardly more error-prone than others;
C is just full of characters waiting to be dropped.  Imagine
converting
	j = --i;
into
	j = -i;
by dropping a '-'.  Or how about changing the parallel clauses
	j = 2, -i
to the single clause
	j = 2 -i
by dropping another comma?  The possibilities are endless.
At least C makes you declare identifiers.

I would venture the bold claim that languages like C should never be
used to implement programs for Venus probes, precisely because they
lend themselves so poorly to formal verification, but everybody would
get mad and a big argument would result, so I won't.

gwyn@Brl@sri-unix (10/27/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     25 Oct 82 10:01:45-EDT (Mon)
I quite agree that it is important to maintain the simplicity and
cleanliness of the C language, and that the existing compilers often
leave something to be desired.  So far the nicest compiler I have used
has been the Ritchie PDP-11 compiler.  PCC seems to have a lot of bugs;
I would LOVE to "accidentally" be mailed the latest Bell Labs version...
There is a "Production Quality C Compiler" project but I haven't heard
about it recently.

Some aspects of C seem to be tied to the way the UNIX loader operates .
Whitesmiths, Ltd. found it necessary to insist on initializing all extern
data exactly once among all files, to keep the more usual flavor of loader
happy.

It would be possible to extend the language to support "read-only"
(constant) initialized data; in order to avoid adding another keyword,
perhaps syntax like the following could be used:
	static char	*errmsg == "Oops, sorry about that!";
On systems supporting read-only data segments, that would be the way to
implement this feature; on more puny systems, the data would just have to
be writable (better do your debugging on non-puny systems!).
This (if supported by the kernel) would obviate the need to kludge up
"sharable strings" a la Berkeley.

The C preprocessor is currently considered part of the language (as is the
Standard I/O library), so it cannot be changed readily.  The problem with
separately compiling #include files, as well as initializing unions, is that
there is no viable GENERAL way to do this; the #include may depend on context
as well as affect the semantics of the source it is included into, and I
believe that unambiguous union initialization would be pretty tricky in the
general case (if not downright impossible).  Clearly features that can only
work sometimes should not be bundled into the language.

The current way to make a variable sometimes in a register, sometimes not
would be something like this:
	int		foo;
	...
	{
	register int	rfoo = foo;
	...
	/* use "rfoo" instead of "foo" here */
	...
	foo = rfoo;			/* if you want to keep the value */
	}
	...
which is admittedly pretty ugly.  However, if you want the capability this
is about as efficient as possible.

gwyn@Brl@sri-unix (10/27/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     25 Oct 82 10:41:58-EDT (Mon)
I'm not sure I like the idea of labels in initializers, BUT if it's
going to happen, then it should be more general:  The label should refer
to the storage address of the constant data, rather than being an integer
index.  This would perhaps be more generally useful.

bcw (10/28/82)

From:	Bruce C. Wright @ Duke University
Re:	C language standards

Well, I can't vouch for the mental capacity of the person who wrote the
code, but one of the early space shots failed because someone had
written a Fortran statement as "do 100 i=1.4" instead of the correct
"do 100 i=1,4" (I don't remember the exact statement but this is the
sense of the error).  Clearly the program hadn't been validated before
the launch - I don't think it even made it into orbit.  Fortran is
probably at least as prone to such problems as is C;  if you really
want strong typing you ought to be looking at Pascal and its derivitives
like Modula and Ada.

I can see the flames coming now.

			Bruce C. Wright @ Duke University

Michael.Young@Cmu-10a@sri-unix (10/28/82)

From: Michael Wayne Young <Michael.Young@Cmu-10a>
Date: 26 October 1982 1531-EDT (Tuesday)
Unfortunately, the "search <libm.a>" construct is something you
want your code-generator to do -- as you said, it becomes
a loader directive.  Currently, though, #-etc. are done by the
C preprocessor...

A more general concern though is that we wouldn't want the C
language have to know anything about its environment... even
though this is minimally more difficult than #include files.
I'd really love the idea -- in fact, some other Dec-10 style
loader directives would be nice, such as a #require (which says,
put this object in there, regardless whether there are resolved
external symbols in it) lines.  [This'd be nice for libraries that
wanted to have their own external variable library elements which
didn't have any required routines in them.  I often want to do this,
but have to hack a bit to do it.]  Still, this belongs in the
environment, not the language.

			Michael

thomas (10/29/82)

I'm confused.  Aren't there 24*60*60 seconds in a day?
=Spencer

mathon (10/30/82)

What would I not like to see in C:
	Overloading Ops.  Ba Humbug!  This is the worst feature of ADA.
	How can we do it to C?
	Complex data.  It makes the language too big and drives it to 
	The pre-processor.  It's ugly.  I would like to see language
	constructs to replace pre-processor commands.
What I would like to see in C:
	In-line functions.  The code for these functions the user wrote
	would be done in-line.
	Enumerated types.  Only as long as I can read/write them also.
	A looping construct that will take advantage of almost all 
	computers looping instructions.  E.g. loop(param, iters) 
	would use dbcc on 68000 and whatever elsewhere to loop on the
	variable param iters times.
	A way to tell the compiler that a variable was not pointed to 
	so that modern optimizing schemes could be used to figure
	common subexpressions and invariant code and register optimization
	better.

I feel all these changes are consistent with the original goals of C and
keep it a language close to the machine with no built-in functions and
a simple syntax.
Comments?

jfw.mit-ccc@Mit-Mc@sri-unix (11/03/82)

Date: 27 Oct 1982 10:35:25-EDT
The best way to hack the concept of

char * table [] = {
	word1:	"The first string",
	word2:	"The second sting",
 ...
};

is to declare an enum type just before (or after) the table, such that

enum tableoffsets {
	word1,word2,...
}

will cause word1 to be 0, word2 to be 1, etc.  This does not require a language
extension, and also resembles features in other languages (consider PASCAL, where
you can declare that an array's indices will ONLY be elements of an enumerated
type, should you desire).

John Woods
jfw@ccc@mit-mc

gwyn@Brl@sri-unix (11/19/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     16 Nov 82 13:36:37-EST (Tue)
It seems to me that the #library function is what Makefiles are for.

HOLSTEGE@Cit-20@sri-unix (11/20/82)

Date: 16 Nov 1982 1351-PST
The difference between #library and makefile is that you don't have to
have a makefile, and hence a special directory, for each one file program
that happens to require special library modules. Also, you only have to
maintain one file and there is less chance of them getting out of sync.

We have a utility here which reads the instructions for compiling a program
directly out of the source itself. For example, if a C program foo.c contains

/*
%  cc -O foo.c -lblech -o foo                                                  
%i strip foo
%i mv foo /usr/local
*/

then "xc foo.c" would execute "cc -O ...", and "xc -i foo.c" would 
"strip foo.c" and "mv ...". We are thinking of replacing /usr/src/cmd/MAKE
with this mechanism.

-------

chris.umcp-cs@UDel-Relay@sri-unix (11/24/82)

From:     Chris Torek <chris.umcp-cs@UDel-Relay>
Date:     20 Nov 82 20:36:37 EST  (Sat)
Date: 16 Nov 1982 1351-PST
	From: HOLSTEGE at Cit-20
	Subject: Re:  Re:  C extensions

	We have a utility here which reads the instructions for compiling
	a program directly out of the source itself....

	We are thinking of replacing /usr/src/cmd/MAKE with this mechanism.

Before you do that, think about people who use makefiles to make
other things: nroff output, other languages, etc.  I have used the
following makefile occasionally:

 .SUFFIXES: .nr .out .xer
 .nr.out:
	nroff -me -Tlpr $< >$@
 .nr.xer:
	nroff -me -Tx12-8 $< | xer -p12 -l8 >/dev/tty17

Somehow I don't think you can express that with your setup.  If you have
the commands in the C comments, nroff prints them.  Nroff comments look
like

 .\"

(I think).  In any case they are very different from any other type
of comment I've ever seen.

HOLSTEGE@Cit-20@sri-unix (11/24/82)

Date: 21 Nov 1982 1921-PST
The program which takes commands for remaking a file from the file itself
dles not care about the commenting convention; it cares about the % at the
beginning of the line. Thus any language which can make multiline comments
can use the method. I do not claim that this supercedes Makefiles, just that
it forms an intermediate step between typing ths "cc" command line every
time you want to recompile, and make. Clearly make is what you want for 
large projects which live in their own directory, but it isn't really 
suitable for one file programs which, even so, have a fairly complex
compilation/loading/installation procedure, which is frequently hard 
to remember.
-------