[net.unix] Improving C

Laws@SRI-AI.ARPA (02/24/84)

From:  Ken Laws <Laws@SRI-AI.ARPA>

The recent messages about strncpy() illustrate the need for string
commands in addition to the character vector commands offered in C
and UNIX.  Character manipulation combined with malloc() can be made
to do whatever you want, but the semantics can be confusing.  I find
it absurd that there is not even a standard library of dynamic string
routines supplied with UNIX.  I have written such a package myself,
and I am sure many others have also.  String routines are easy to
write in C, which may be why they are always hacked inline, but why
must we all reinvent such wheels?  A separate string package could
be made reasonably efficient and could include extras such as a
length field (making it possible to embed nulls in a string) or a
current position pointer (making a string into a virtual disk file).

The following is a list of other suggestions I have for improving
C and the C environment:


  The C language is reasonably clean, but it could be improved.
  (Maybe the next version should be named D?)  In particular,
  I would like:

    Dynamic strings that are distinct from character vectors.  A
    string should be represented by its address as is now done for
    arrays.  String routines should return copied substrings, etc.
    A concatenation routine is particularly needed.  (We have provided
    one on our testbed, but without garbage collection such things are
    a little dangerous.)

    Dynamic matrices that are addressable using multidimensional subscripts.

    Lists.  Definition of a list as a char ** works, but it must be
    initialized as a (char *)[].  This could be fixed in the compiler.

    Classes, as implemented in the "class" preprocessor from Bell Labs.

    Begin(name) and end(name) delimiters as part of the language.  Our
    SRI testbed macros do not check for matching names, and cannot be
    used for top-level brackets because ctags does not expand the macros
    and gets confused.  The cb program also fails to recognize brackets
    hidden by macros.

    True nested procedures in addition to the current nested blocks.
    At present it is difficult to make certain variables global to
    a main subroutine and its "servants", yet not global to everyone.
    This also makes it difficult to convert code from other languages
    that do have this capability.

    Variables declared outside functions should be private (static) by
    default.  A "global" or "public" keyword should be required to make
    them available externally.

    A "proc" or similar keyword used in function headers so that
    they can be easily distinguished from variable declarations.
    A "forward" or "extern" keyword could be required to distinguish
    headers without bodies.  This would simplify the job of cc, cb,
    ctags, and other programs that analyze C source files.

    It should be possible to use an enumeration code (e.g., NONE) with
    different values in different enumerations.  Macro names must
    necessarily override enumeration names, so it is probably an error
    to have the same codes for both.  Some type of package or union
    specification is needed for enums.

    An nargs() function to return the number of arguments passed
    to a routine.  Such a function exists in the Berkeley
    UNIX, but is not documented.  [The Berkeley routine actually
    returns the number of words in the argument list, which can
    differ from the number of arguments.]

    Macros that can handle a variable number of arguments.  At present
    it is impossible to extract some of the arguments for various
    purposes and then pass the rest (however many) on to printf.
    It is also impossible to replace "return" with a macro because
    it may or may not have an argument.

    An OMITTED argument code of some type that can be used to test
    whether an argument to a function or macro was omitted by typing
    successive commas or providing too few arguments.  This might be
    coupled to a default mechanism, but the user can easily write
    his own defaults if the OMITTED code were implemented.

    Some type of entry and exit hooks that can be used for debug tracing,
    timing instrumentation, etc.  It is currently awkward to intercept
    return statements because they accept a variable number of arguments
    (one argument or none, but not an empty argument list).

    The assignment operator should have been := instead of ==.  Use of
    = instead of == in conditionals is a common source of error.

    I particularly object to the statement in the manual that "Expressions
    involving a commutative and associative operator (*, +, &, |, ^) may be
    rearranged arbitrarily, even in the presence of parentheses; ...".
    This is inexcusible in a modern language.

    I am also unhappy about the number of machine-dependent results
    that C permits.  (E.g., overflow and divide check, rounding of
    negative numbers, mod (%) on negative numbers, sign extension on
    chars, sign fill on right shift, direction of bits accessed by
    bit fields.)

    It should be possible to put spaces before a # command for the
    compiler.  Also, it should be documented that spaces are legal
    after the #.

    Use of escaped linefeeds in a macro confuses the compiler:  its
    diagnostic messages do not count the continuations as lines, but
    vi does count them.  (This has been fixed in Berkeley 4.2.)

    Fclose should be called automatically when a program terminates
    abnormally.  (It is already called for normal terminations.)
    It is very difficult to find some bugs when buffers are not dumped.
    If the program runs for a long time, it is convenient to pipe
    its output into a log file instead of tying up a terminal.  If the
    log file is not flushed, however, this is not only unproductive;
    it is misleading.


  We just found another bug where setting array[4] in something 
  declared "int array[4]" overwrote a pointer in a distant piece
  of code.  C ought to offer a run-time subscript checking facility,
  and certainly should have caught this compile-time error.
  (Hardware speed and storage are becoming less of a consideration
  every year.  Programming ease and software reliability should be
  dominant.)


  The compiler should warn about statements like "x+1;" since they
  can have no side effects or other useful purpose.  Most likely
  the statement is intended to be "x+=1;".


  The expression "(cast) (flag == 0) ? 0 : 1" applies the
  cast to the boolean test rather than to the output of the
  conditional expression.  I would much rather see the syntax
  "(cast) ifv (flag == 0) thenv 0 elsev 1" where the cast
  applies to the final value.  [I have implemented the ifv/thenv/elsev
  macros, but there is no way to put hidden parentheses around the
  entire constuct unless one adds a special terminator (e.g., "fi").]


  The expression (A,B) returns the value of B.  There needs to
  be a similar syntax for those cases where the value of A is
  desired, and A must be executed first.  In particular, suppose
  that we are writing a macro noteerr() which is supposed to
  evaluate its argument and take some action based on a global
  return code, then return the result of the initial evaluation
  as its value.  For example, suppose we want to pass some functional
  value, func(), to a subroutine, subr(), and that we want to wrap
  the evaluation of func() in an error handler:

      subr(...,noteerr(func()),...);

  There is currently no way to do this and have the whole noteerr()
  macro return an object of the same type and value as func() when
  func() may be of arbitrary type.  It could be done if there were
  an (A,B) syntax that returned the value of B.


  The compiler should accept string continuations of the form

      printf("Beginning of string"
	  " continuation of string.");

  The SAIL/MAINSAIL dynamic string concatenation syntax is
  even more flexible, but even this primitive convention would be
  adequate for compile-time concatenation.


  Every enum should have a validity checking routine, e.g.
  valid(...).  This would permit one to identify illegal
  values without converting everything to ints.  Note that
  valid enums are not necessarily sequential, so that the
  test can be complicated.  This checking cannot be done at
  compile time, so it may be necessary for the user to provide
  the checking routines; a pity.


  I just got caught again closing a comment with \* instead of
  */.  The compiler just ate everything up to the next comment.
  I see no reason why C can't allow nested comments and also
  check for proper balance of comment delimiters.
-------

gwyn%brl-vld@sri-unix.UUCP (02/24/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

I started to prepare detailed point-by-point responses to your suggestions
for "improvements" to C, but I got tired of it after a while.  Basically,
my impression is that you do not properly understand the charter of the C
language.  Many of your suggestions would force significant run-time
overhead for all C programs, to support operations that are either
unnecessary if you know how to use C effectively or can be done already
without forcing a global implementation that may not be suitable for all
applications.

The ANSI C Language Standards Committee has addressed most of the issues
that can be resolved without breaking the millions of lines of existing
C source code.  A more featureful follow-on language, C++, appears to be
under development inside AT&T Bell Laboratories; it includes abstract
data types a la "classes" and other improvements.  One rule for C itself
is to not add any keywords to the language; C++ would be exempt from this.

One of the great attractions of C is the simplicity and elegance of its
basic design (although there are a few misfeatures, certainly).  It would
be a pity to lose that in a sea of "features".

gwyn%brl-vld@sri-unix.UUCP (02/24/84)

From:  Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

I started to prepare detailed point-by-point responses to your suggestions
for "improvements" to C, but I got tired of it after a while.  Basically,
my impression is that you do not properly understand the charter of the C
language.  Many of your suggestions would force significant run-time
overhead for all C programs, to support operations that are either
unnecessary if you know how to use C effectively or can be done already
without forcing a global implementation that may not be suitable for all
applications.

The ANSI C Language Standards Committee has addressed most of the issues
that can be resolved without breaking the millions of lines of existing
C source code.  A more featureful follow-on language, C++, appears to be
under development inside AT&T Bell Laboratories; it includes abstract
data types a la "classes" and other improvements.  One rule for C itself
is to not add any keywords to the language; C++ would be exempt from this.

One of the great attractions of C is the simplicity and elegance of its
basic design (although there are a few misfeatures, certainly).  It would
be a pity to lose that in a sea of "features".

beattie%mitre-gateway@sri-unix.UUCP (02/24/84)

From:  brian beattie <beattie@mitre-gateway>

The language you want should be called PL/2?

davy@ecn-ee.UUCP (03/02/84)

#R:sri-arpa:-1688400:ecn-ee:17100002:000:1770
ecn-ee!davy    Mar  2 08:05:00 1984


As far as this person's (his name appeared as "ARPA" here), ideas of 
improvements to C go, I don't consider them improvments.  It sounds like
you want to make C look like Pascal or Ada, which it isn't.

C is a language meant for PROGRAMMERS.  We KNOW what's going on (or are
supposed to) and don't need the language/compiler/run-time-environment
to hold our hands through the whole thing.  That's what debuggers are
for - they help you figure out what went wrong.

I am violently opposed to putting any kind of type checking, array bounds
checking, etc. into C.  It slows things down greatly (try running a Pascal
program with and without the run-time checks sometime), and once program
development is done, serves no useful purpose.  It would be nice, though,
if the COMPILER had an option to do array-bounds checking, like the f77
compiler does (did?).

I can also live without "proc procedurename" for the sole purpose of
making programs prettier (which is what he said it was for).  The same
thing goes for the "forward" declaration.  C doesn't NEED a forward
declaration, the loader ties all that stuff together.  The forward
keyword was put into Pascal so that it could be compiled in one pass
(everything must be defined before it's used when compiling in one pass),
not so your programs could look "pretty".  Since C is compiled in at
least two passes anyway, what's the point?

I won't respond to the rest of that stuff, I've noticed this is turning
into a "why I hate Pascal and don't want to see any of its silliness in
C" diatribe.  This is an interesting discussion, at least from the
wishlist point of view, but most of the stuff I have seen so far would
not be, in my opinion, desirable.

Donning my flame-retardant tuxedo....
--Dave Curry
pur-ee!davy

kjpires@ucbcad.UUCP (03/05/84)

	Do you want a fast, powerful language like C, or do you want
a hand-holding, slow language like Pascal?  May I submit that Modula-2
or ADA might be just what you are looking for.  They have type checking and
array bounds policing galore.  Please do not subject the rest of us to
your unneccessary overhead.

					Alexander Burchell
					[agb@ucbarpa]
					[ucbvax!agb]

"Pay no attention to that man in the mail header!"

henry@utzoo.UUCP (Henry Spencer) (03/06/84)

To misquote Charles Lindsey (some of you may have heard this one
before; I'm fond of it):

	C'est magnifique, mais ce n'est pas C.

(For those of you who are monolingual:  "It's good stuff, but it
sure isn't C.")

It sounds like it might be an interesting language.  But it isn't
anything I would recognize as C.  Not even close.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

davy@ecn-ee.UUCP (03/06/84)

#R:sri-arpa:-1688400:ecn-ee:17100003:000:641
ecn-ee!davy    Mar  5 09:08:00 1984


I would like to clarify my previous posting on "Improving C" -- 

	I didn't mean to say that I thought run-time checks were lousy
as a rule.  As a matter of fact, I'm all for them.  My point was that
this stuff DOES NOT belong in the language, it belongs in the compiler.
I would love to have a C compiler with a flag to turn on array bounds
checking (ala the f77 compiler), and other neato debugging stuff.  But
I don't think it should be part of the language, as it is with Pascal.

Thanks for everyone's comments so far, but please, don't flame at me
for this specific point any more.

--Dave Curry
decvax!pur-ee!davy
eevax.davy@purdue

rob@ctvax.UUCP (03/06/84)

#R:sri-arpa:-1691800:ctvax:54900002:000:222
ctvax!rob    Mar  5 02:44:00 1984

Sorry, that name's been taken. When IBM unleashed PL/1 on the world,
I believe they trademarked PL/2 through PL/99 to avoid be upstaged.

Rob Spray
uucp:     ... {decvax!cornell!|ucbvax!nbires!|allegra!parsec!}ctvax!rob

rcd@opus.UUCP (03/10/84)

<>
 > 	Do you want a fast, powerful language like C, or do you want
 > a hand-holding, slow language like Pascal?  May I submit that Modula-2
 > or ADA might be just what you are looking for.  They have type checking and
 > array bounds policing galore.  Please do not subject the rest of us to
 > your unneccessary overhead.

1.  Languages are not fast or slow.  The code generated by a compiler for
the language is fast or slow.  It is possible to generate code from Pascal
which is roughly as efficient as that generated by a C compiler for most
equivalent constructs.  The existence of compilers which generate slow code
doesn't necessarily make the language "slow".

2.  C is not `powerful' compared to Pascal.  They are different, although
Pascal has a couple of features that C doesn't have which some people
associate with the "power" of the language.

3.  Type checking is neither unnecessary, nor does it involve any overhead
in the usual sense (execution time/space penalty).

4.  Any compiler suitable for production use allows you to turn off runtime
checks; you can get rid of them when you think you can take the chance.
The semantics of C are such that it's difficult to make array bounds
checking work even if you want it.  There's a big difference between having
a facility which you can use or not as you wish and not being able to have
it at all.

STILL, I'll side with the camp that says "Don't mess around with C."  It's
got some problems and some funny things in it, but it really ain't broke so
it don't need fixing!
-- 
{hao,ucbvax,allegra}!nbires!rcd

Us.Travis%cu20b@BRL-VGR.ARPA (03/11/84)

From:  Travis Lee Winfrey <Us.Travis%cu20b@BRL-VGR.ARPA>

Don't be so quick to call Pascal slow and C quick.  I can list compilers (just
as well as you can) on various machines for either language that are as slow as
or as fast as you want.  Although a lot of the suggestions in the original
Improving C do not reflect the flavor of the language, many of them would be
quite useful.  I personally would kill for built-in type checking and array
bounds checking during the debugging phase of my programs.  They could be
removed after the program reached production quality....

Travis
-------

hutch@shark.UUCP (Stephen Hutchison) (03/11/84)

<compile, link, load, run, flame>

What is really wrong with an options switch allowing bounds checking
to be included, if the default is to leave it out?

I don't recall it being forbidden in the Book.

Personally I think that it's just fear of the loss of job security
on the part of those objecting.  After all, if bounds checking in
the development phase were to speed your time-to-running-program,
you'd have to do more work.  (Oh, before I offend anyone, :-}  )

Hutch  (in favor of allowing options rather than forbidding them)

hal@cornell.UUCP (Hal Perkins) (03/12/84)

>         Do you want a fast, powerful language like C, or do you want
> a hand-holding, slow language like Pascal?  May I submit that Modula-2
> or ADA might be just what you are looking for.  They have type checking and
> array bounds policing galore.  Please do not subject the rest of us to
> your unneccessary overhead.

What on earth is this nonsense?  There's nothing inherently inefficient
about Pascal compared to C.  Most compilers allow you to turn off any runtime
checks, and the resulting code can be as fast as that generated by a C
compiler that does an equivalent job of code generation.

Now, this is personal opinion, but...

I'm getting real tired of the C bigots on the net who take gratuitous
pot-shots at other programming languages; who have apparently seen "revealed
truth", and it's called C, and it's better than anything else because they
say so.  Pascal has some problems, but so does C (take a look at all the
debates in net.lang.c about the true meaning of obscure C constructs).
Neither one is the "ultimate" programming language.  So lets stop the non-
constructive putdowns of programming languages that do not subscribe to
the True Religion of the Unix hackers.  It's just wasting everyone's time.

(If you feel moved to reply to this with constructive comments, please do
so.  Keep abusive remarks to yourself.)

Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET: hal@crnlcs

POLARIS%usc-isi@sri-unix.UUCP (03/12/84)

Those who want array bounds checking and such might like to look
at bcc and safe C, both of which are like checkout compilers in
allowing run-time and compile time range and reference checks.
When the program has passed all debugging phases, you can
recompile with cc and reduce both the code size and the operating
overhead.

bcc: delft consulting copr 432 park ave so.  new york, ny 10016

safe C : catalytix corp 55 wheeler st.  Cambridge, mass 02138

Both seem to offer many of the services not available with lint,
but are available only when wanted or needed, unlike other
languages.  I've seen both of these but not had a chance to use
them much.  Mike Seyfrit <polaris @isi.arpa>

smoot@ut-sally.UUCP (Smoot Carl-Mitchell) (03/12/84)

I agree with Hal's comments about people putting down other
programming languages.  

I started out in programming 16 years ago as a Fortran 
hacker, migrated to Pascal during my years
as a graduate student and eventually learned C on my own.
Along the way I picked up some Cobol, Lisp, PL/1, Snobol, and
APL experience.  

All the above languages have their strengths and weaknesses.
Some of the languages were invented before the field knew very 
much about compiler and programming language theory and their design
reflects the level of knowledge at the time of their invention.  Doubtless 
our knowledge in this area will continue to mature and future languages
will be better than what we have now.

Every programmer has a "favorite" language.  My favorite right now is
C and I appreciate its strengths, but I am not so blind as to ignore
it's weaknesses.  I suggest that before taking potshots at other 
languages that you investigate the issues involved further before
making false or misleading statements.
-- 
Smoot Carl-Mitchell, CS Dept. University of Texas at Austin
{seismo, ctvax, ihnp4, kpno}!ut-sally!smoot, smoot@ut-sally.{ARPA, UUCP}

ron%brl-vgr@sri-unix.UUCP (03/12/84)

From:      Ron Natalie <ron@brl-vgr>

It's not PL/1 it's PL/I.  (note the roman numeral).

nather@utastro.UUCP (Ed Nather) (03/16/84)

[]
	From:  Travis Lee Winfrey <Us.Travis%cu20b@BRL-VGR.ARPA>

	I personally would kill for built-in type checking and array
	bounds checking during the debugging phase of my programs.
	They could be removed after the program reached production quality....

	Travis
	-------

If I may paraphrase a point made in "Software Tools" by K & P (I can't find
the exact wording):  This is much like wearing a parachute while you are in
pilot's training on the ground, then discarding it when you go aloft.  If the
checking is needed for debugging, it is almost certainly needed for what is
sometimes called "malicious input."

-- 

                                       Ed Nather
                                       ihnp4!{ut-sally,kpno}!utastro!nather
                                       Astronomy Dept., U. of Texas, Austin

trb@masscomp.UUCP (03/16/84)

I beg His humblest pardon for contradicting a point made in "Software
Tools," but while I agree that bounds checking might certainly be needed
for detecting malicious input in programs, it might also add lots of
overhead to the darkest guts of a program, where you need the speed
and certainly don't need the checking.  Like jumping out of a plane and
making sure that the touchdown area is covered with pillows 100 feet
deep.

I'll take bounds checking in production code as long as I can turn it
off in sections of code where I don't want it, using a compiler
directive.  I know that I could certainly use checking when I debug,
and I often wish I had it, and I wish that other people had it when I
have to chase down their bugs.

Having to compile separate modules with and without checking wouldn't
be convenient enough, I think.

	Andy Tannenbaum   Masscomp Inc  Westford MA   (617) 692-6200 x274

tim@unc.UUCP (Tim Maroney) (03/17/84)

I disagree that it is a mistake to remove array bounds checking and such
after debugging is complete.  If you have run a program on general test data
(including anomalous data) and never gotten the error, then you may fairly
safely assume that the error will not happen on any input data.  It would be
a mistake to sacrifice run-time efficiency for a check on an impossible (or
at least VERY improbable) error.

Of course, you'd better make sure that your test data really is "general".
--
Tim Maroney, The Censored Hacker
mcnc!unc!tim (USENET), tim.unc@csnet-relay (ARPA)

All opinions expressed herein are completely my own, so don't go assuming
that anyone else at UNC feels the same way.

dave@utcsrgv.UUCP (Dave Sherman) (03/19/84)

The current (#5) issue of "UNIX Review" has an interesting article
on "Safe C", a C compiler which purportedly protects against such
errors as exceeding array bounds (at run-time).

Dave Sherman
Toronto
-- 
 {allegra,cornell,decvax,ihnp4,linus,utzoo}!utcsrgv!dave

leiby@masscomp.UUCP (03/20/84)

> Hal Perkins:
>
> I'm getting real tired of the C bigots on the net who take gratuitous
> pot-shots at other programming languages; who have apparently seen "revealed
> truth", and it's called C, and it's better than anything else because they
> say so.

Quite right.  Everyone knows that BLISS-11 is the One True Programming
Language.


-- 
Mike Leibensperger
{decvax,tektronix,harpo}!masscomp!leiby
Masscomp; One Technology Park; Westford MA 01886

andyb@dartvax.UUCP (Andy Behrens) (03/20/84)

>  I personally would kill for built-in type checking and array
>  bounds checking during the debugging phase of my programs.  They could be
>  removed after the program reached production quality....

Dijkstra points out that it is foolish to just use bounds checking during
testing (when you can recover from errors), only to remove the checking
for production runs (when an error becomes a serious thing).

-- 
			Andy Behrens
			P.O. Box 24, East Thetford, Vermont
			UUCP:  {decvax,linus,cornell,dalcs}!dartvax!andyb
			CSNET: andyb@dartmouth
			ARPA:  andyb%dartmouth@csnet-relay

wls@astrovax.UUCP (William L. Sebok) (03/22/84)

>>  I personally would kill for built-in type checking and array
>>  bounds checking during the debugging phase of my programs.  They could be
>>  removed after the program reached production quality....

>Dijkstra points out that it is foolish to just use bounds checking during
>testing (when you can recover from errors), only to remove the checking
>for production runs (when an error becomes a serious thing).
>			Andy Behrens

This is silly.  A program that has to handle input from the outside world should
do its own checking of the consistency of the input.  This should remain during
production runs.  Other than that one should not have to crucify one's
production run on the cross of built-in type checking and array bounds checking.
This is the an example of too much of the modern computer theory of "Cpu speed
be dammed", that has resulted in so many slow programs and operating systems.
-- 
Bill Sebok			Princeton University, Astrophysics
{allegra,akgua,burl,cbosgd,decvax,ihnp4,kpno,princeton,vax135}!astrovax!wls

gwyn@brl-vgr.ARPA (Doug Gwyn ) (03/25/84)

Another point is that correct code benefits NOT AT ALL from run-time
subcript range checking and other such aids.  How about getting the
code right in the first place instead of relying on run-time detection
of errors to find your mistakes.  (This assumes you are able to
run an exhaustive set of test cases through your code anyway, which is
a very poor assumption.)  I recommend Myers' "The Art of Software
Testing" to those who care about the subject.

Input validation is a well-known requirement.  Good COBOL programmers
do this automatically.  Is it possible that there are no good UNIX
programmers?  :-)