Laws@SRI-AI.ARPA (02/24/84)
From: Ken Laws <Laws@SRI-AI.ARPA> The recent messages about strncpy() illustrate the need for string commands in addition to the character vector commands offered in C and UNIX. Character manipulation combined with malloc() can be made to do whatever you want, but the semantics can be confusing. I find it absurd that there is not even a standard library of dynamic string routines supplied with UNIX. I have written such a package myself, and I am sure many others have also. String routines are easy to write in C, which may be why they are always hacked inline, but why must we all reinvent such wheels? A separate string package could be made reasonably efficient and could include extras such as a length field (making it possible to embed nulls in a string) or a current position pointer (making a string into a virtual disk file). The following is a list of other suggestions I have for improving C and the C environment: The C language is reasonably clean, but it could be improved. (Maybe the next version should be named D?) In particular, I would like: Dynamic strings that are distinct from character vectors. A string should be represented by its address as is now done for arrays. String routines should return copied substrings, etc. A concatenation routine is particularly needed. (We have provided one on our testbed, but without garbage collection such things are a little dangerous.) Dynamic matrices that are addressable using multidimensional subscripts. Lists. Definition of a list as a char ** works, but it must be initialized as a (char *)[]. This could be fixed in the compiler. Classes, as implemented in the "class" preprocessor from Bell Labs. Begin(name) and end(name) delimiters as part of the language. Our SRI testbed macros do not check for matching names, and cannot be used for top-level brackets because ctags does not expand the macros and gets confused. The cb program also fails to recognize brackets hidden by macros. True nested procedures in addition to the current nested blocks. At present it is difficult to make certain variables global to a main subroutine and its "servants", yet not global to everyone. This also makes it difficult to convert code from other languages that do have this capability. Variables declared outside functions should be private (static) by default. A "global" or "public" keyword should be required to make them available externally. A "proc" or similar keyword used in function headers so that they can be easily distinguished from variable declarations. A "forward" or "extern" keyword could be required to distinguish headers without bodies. This would simplify the job of cc, cb, ctags, and other programs that analyze C source files. It should be possible to use an enumeration code (e.g., NONE) with different values in different enumerations. Macro names must necessarily override enumeration names, so it is probably an error to have the same codes for both. Some type of package or union specification is needed for enums. An nargs() function to return the number of arguments passed to a routine. Such a function exists in the Berkeley UNIX, but is not documented. [The Berkeley routine actually returns the number of words in the argument list, which can differ from the number of arguments.] Macros that can handle a variable number of arguments. At present it is impossible to extract some of the arguments for various purposes and then pass the rest (however many) on to printf. It is also impossible to replace "return" with a macro because it may or may not have an argument. An OMITTED argument code of some type that can be used to test whether an argument to a function or macro was omitted by typing successive commas or providing too few arguments. This might be coupled to a default mechanism, but the user can easily write his own defaults if the OMITTED code were implemented. Some type of entry and exit hooks that can be used for debug tracing, timing instrumentation, etc. It is currently awkward to intercept return statements because they accept a variable number of arguments (one argument or none, but not an empty argument list). The assignment operator should have been := instead of ==. Use of = instead of == in conditionals is a common source of error. I particularly object to the statement in the manual that "Expressions involving a commutative and associative operator (*, +, &, |, ^) may be rearranged arbitrarily, even in the presence of parentheses; ...". This is inexcusible in a modern language. I am also unhappy about the number of machine-dependent results that C permits. (E.g., overflow and divide check, rounding of negative numbers, mod (%) on negative numbers, sign extension on chars, sign fill on right shift, direction of bits accessed by bit fields.) It should be possible to put spaces before a # command for the compiler. Also, it should be documented that spaces are legal after the #. Use of escaped linefeeds in a macro confuses the compiler: its diagnostic messages do not count the continuations as lines, but vi does count them. (This has been fixed in Berkeley 4.2.) Fclose should be called automatically when a program terminates abnormally. (It is already called for normal terminations.) It is very difficult to find some bugs when buffers are not dumped. If the program runs for a long time, it is convenient to pipe its output into a log file instead of tying up a terminal. If the log file is not flushed, however, this is not only unproductive; it is misleading. We just found another bug where setting array[4] in something declared "int array[4]" overwrote a pointer in a distant piece of code. C ought to offer a run-time subscript checking facility, and certainly should have caught this compile-time error. (Hardware speed and storage are becoming less of a consideration every year. Programming ease and software reliability should be dominant.) The compiler should warn about statements like "x+1;" since they can have no side effects or other useful purpose. Most likely the statement is intended to be "x+=1;". The expression "(cast) (flag == 0) ? 0 : 1" applies the cast to the boolean test rather than to the output of the conditional expression. I would much rather see the syntax "(cast) ifv (flag == 0) thenv 0 elsev 1" where the cast applies to the final value. [I have implemented the ifv/thenv/elsev macros, but there is no way to put hidden parentheses around the entire constuct unless one adds a special terminator (e.g., "fi").] The expression (A,B) returns the value of B. There needs to be a similar syntax for those cases where the value of A is desired, and A must be executed first. In particular, suppose that we are writing a macro noteerr() which is supposed to evaluate its argument and take some action based on a global return code, then return the result of the initial evaluation as its value. For example, suppose we want to pass some functional value, func(), to a subroutine, subr(), and that we want to wrap the evaluation of func() in an error handler: subr(...,noteerr(func()),...); There is currently no way to do this and have the whole noteerr() macro return an object of the same type and value as func() when func() may be of arbitrary type. It could be done if there were an (A,B) syntax that returned the value of B. The compiler should accept string continuations of the form printf("Beginning of string" " continuation of string."); The SAIL/MAINSAIL dynamic string concatenation syntax is even more flexible, but even this primitive convention would be adequate for compile-time concatenation. Every enum should have a validity checking routine, e.g. valid(...). This would permit one to identify illegal values without converting everything to ints. Note that valid enums are not necessarily sequential, so that the test can be complicated. This checking cannot be done at compile time, so it may be necessary for the user to provide the checking routines; a pity. I just got caught again closing a comment with \* instead of */. The compiler just ate everything up to the next comment. I see no reason why C can't allow nested comments and also check for proper balance of comment delimiters. -------
gwyn%brl-vld@sri-unix.UUCP (02/24/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> I started to prepare detailed point-by-point responses to your suggestions for "improvements" to C, but I got tired of it after a while. Basically, my impression is that you do not properly understand the charter of the C language. Many of your suggestions would force significant run-time overhead for all C programs, to support operations that are either unnecessary if you know how to use C effectively or can be done already without forcing a global implementation that may not be suitable for all applications. The ANSI C Language Standards Committee has addressed most of the issues that can be resolved without breaking the millions of lines of existing C source code. A more featureful follow-on language, C++, appears to be under development inside AT&T Bell Laboratories; it includes abstract data types a la "classes" and other improvements. One rule for C itself is to not add any keywords to the language; C++ would be exempt from this. One of the great attractions of C is the simplicity and elegance of its basic design (although there are a few misfeatures, certainly). It would be a pity to lose that in a sea of "features".
gwyn%brl-vld@sri-unix.UUCP (02/24/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> I started to prepare detailed point-by-point responses to your suggestions for "improvements" to C, but I got tired of it after a while. Basically, my impression is that you do not properly understand the charter of the C language. Many of your suggestions would force significant run-time overhead for all C programs, to support operations that are either unnecessary if you know how to use C effectively or can be done already without forcing a global implementation that may not be suitable for all applications. The ANSI C Language Standards Committee has addressed most of the issues that can be resolved without breaking the millions of lines of existing C source code. A more featureful follow-on language, C++, appears to be under development inside AT&T Bell Laboratories; it includes abstract data types a la "classes" and other improvements. One rule for C itself is to not add any keywords to the language; C++ would be exempt from this. One of the great attractions of C is the simplicity and elegance of its basic design (although there are a few misfeatures, certainly). It would be a pity to lose that in a sea of "features".
beattie%mitre-gateway@sri-unix.UUCP (02/24/84)
From: brian beattie <beattie@mitre-gateway> The language you want should be called PL/2?
davy@ecn-ee.UUCP (03/02/84)
#R:sri-arpa:-1688400:ecn-ee:17100002:000:1770 ecn-ee!davy Mar 2 08:05:00 1984 As far as this person's (his name appeared as "ARPA" here), ideas of improvements to C go, I don't consider them improvments. It sounds like you want to make C look like Pascal or Ada, which it isn't. C is a language meant for PROGRAMMERS. We KNOW what's going on (or are supposed to) and don't need the language/compiler/run-time-environment to hold our hands through the whole thing. That's what debuggers are for - they help you figure out what went wrong. I am violently opposed to putting any kind of type checking, array bounds checking, etc. into C. It slows things down greatly (try running a Pascal program with and without the run-time checks sometime), and once program development is done, serves no useful purpose. It would be nice, though, if the COMPILER had an option to do array-bounds checking, like the f77 compiler does (did?). I can also live without "proc procedurename" for the sole purpose of making programs prettier (which is what he said it was for). The same thing goes for the "forward" declaration. C doesn't NEED a forward declaration, the loader ties all that stuff together. The forward keyword was put into Pascal so that it could be compiled in one pass (everything must be defined before it's used when compiling in one pass), not so your programs could look "pretty". Since C is compiled in at least two passes anyway, what's the point? I won't respond to the rest of that stuff, I've noticed this is turning into a "why I hate Pascal and don't want to see any of its silliness in C" diatribe. This is an interesting discussion, at least from the wishlist point of view, but most of the stuff I have seen so far would not be, in my opinion, desirable. Donning my flame-retardant tuxedo.... --Dave Curry pur-ee!davy
kjpires@ucbcad.UUCP (03/05/84)
Do you want a fast, powerful language like C, or do you want a hand-holding, slow language like Pascal? May I submit that Modula-2 or ADA might be just what you are looking for. They have type checking and array bounds policing galore. Please do not subject the rest of us to your unneccessary overhead. Alexander Burchell [agb@ucbarpa] [ucbvax!agb] "Pay no attention to that man in the mail header!"
henry@utzoo.UUCP (Henry Spencer) (03/06/84)
To misquote Charles Lindsey (some of you may have heard this one before; I'm fond of it): C'est magnifique, mais ce n'est pas C. (For those of you who are monolingual: "It's good stuff, but it sure isn't C.") It sounds like it might be an interesting language. But it isn't anything I would recognize as C. Not even close. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
davy@ecn-ee.UUCP (03/06/84)
#R:sri-arpa:-1688400:ecn-ee:17100003:000:641 ecn-ee!davy Mar 5 09:08:00 1984 I would like to clarify my previous posting on "Improving C" -- I didn't mean to say that I thought run-time checks were lousy as a rule. As a matter of fact, I'm all for them. My point was that this stuff DOES NOT belong in the language, it belongs in the compiler. I would love to have a C compiler with a flag to turn on array bounds checking (ala the f77 compiler), and other neato debugging stuff. But I don't think it should be part of the language, as it is with Pascal. Thanks for everyone's comments so far, but please, don't flame at me for this specific point any more. --Dave Curry decvax!pur-ee!davy eevax.davy@purdue
rob@ctvax.UUCP (03/06/84)
#R:sri-arpa:-1691800:ctvax:54900002:000:222 ctvax!rob Mar 5 02:44:00 1984 Sorry, that name's been taken. When IBM unleashed PL/1 on the world, I believe they trademarked PL/2 through PL/99 to avoid be upstaged. Rob Spray uucp: ... {decvax!cornell!|ucbvax!nbires!|allegra!parsec!}ctvax!rob
rcd@opus.UUCP (03/10/84)
<> > Do you want a fast, powerful language like C, or do you want > a hand-holding, slow language like Pascal? May I submit that Modula-2 > or ADA might be just what you are looking for. They have type checking and > array bounds policing galore. Please do not subject the rest of us to > your unneccessary overhead. 1. Languages are not fast or slow. The code generated by a compiler for the language is fast or slow. It is possible to generate code from Pascal which is roughly as efficient as that generated by a C compiler for most equivalent constructs. The existence of compilers which generate slow code doesn't necessarily make the language "slow". 2. C is not `powerful' compared to Pascal. They are different, although Pascal has a couple of features that C doesn't have which some people associate with the "power" of the language. 3. Type checking is neither unnecessary, nor does it involve any overhead in the usual sense (execution time/space penalty). 4. Any compiler suitable for production use allows you to turn off runtime checks; you can get rid of them when you think you can take the chance. The semantics of C are such that it's difficult to make array bounds checking work even if you want it. There's a big difference between having a facility which you can use or not as you wish and not being able to have it at all. STILL, I'll side with the camp that says "Don't mess around with C." It's got some problems and some funny things in it, but it really ain't broke so it don't need fixing! -- {hao,ucbvax,allegra}!nbires!rcd
Us.Travis%cu20b@BRL-VGR.ARPA (03/11/84)
From: Travis Lee Winfrey <Us.Travis%cu20b@BRL-VGR.ARPA> Don't be so quick to call Pascal slow and C quick. I can list compilers (just as well as you can) on various machines for either language that are as slow as or as fast as you want. Although a lot of the suggestions in the original Improving C do not reflect the flavor of the language, many of them would be quite useful. I personally would kill for built-in type checking and array bounds checking during the debugging phase of my programs. They could be removed after the program reached production quality.... Travis -------
hutch@shark.UUCP (Stephen Hutchison) (03/11/84)
<compile, link, load, run, flame> What is really wrong with an options switch allowing bounds checking to be included, if the default is to leave it out? I don't recall it being forbidden in the Book. Personally I think that it's just fear of the loss of job security on the part of those objecting. After all, if bounds checking in the development phase were to speed your time-to-running-program, you'd have to do more work. (Oh, before I offend anyone, :-} ) Hutch (in favor of allowing options rather than forbidding them)
hal@cornell.UUCP (Hal Perkins) (03/12/84)
> Do you want a fast, powerful language like C, or do you want > a hand-holding, slow language like Pascal? May I submit that Modula-2 > or ADA might be just what you are looking for. They have type checking and > array bounds policing galore. Please do not subject the rest of us to > your unneccessary overhead. What on earth is this nonsense? There's nothing inherently inefficient about Pascal compared to C. Most compilers allow you to turn off any runtime checks, and the resulting code can be as fast as that generated by a C compiler that does an equivalent job of code generation. Now, this is personal opinion, but... I'm getting real tired of the C bigots on the net who take gratuitous pot-shots at other programming languages; who have apparently seen "revealed truth", and it's called C, and it's better than anything else because they say so. Pascal has some problems, but so does C (take a look at all the debates in net.lang.c about the true meaning of obscure C constructs). Neither one is the "ultimate" programming language. So lets stop the non- constructive putdowns of programming languages that do not subscribe to the True Religion of the Unix hackers. It's just wasting everyone's time. (If you feel moved to reply to this with constructive comments, please do so. Keep abusive remarks to yourself.) Hal Perkins UUCP: {decvax|vax135|...}!cornell!hal Cornell Computer Science ARPA: hal@cornell BITNET: hal@crnlcs
POLARIS%usc-isi@sri-unix.UUCP (03/12/84)
Those who want array bounds checking and such might like to look at bcc and safe C, both of which are like checkout compilers in allowing run-time and compile time range and reference checks. When the program has passed all debugging phases, you can recompile with cc and reduce both the code size and the operating overhead. bcc: delft consulting copr 432 park ave so. new york, ny 10016 safe C : catalytix corp 55 wheeler st. Cambridge, mass 02138 Both seem to offer many of the services not available with lint, but are available only when wanted or needed, unlike other languages. I've seen both of these but not had a chance to use them much. Mike Seyfrit <polaris @isi.arpa>
smoot@ut-sally.UUCP (Smoot Carl-Mitchell) (03/12/84)
I agree with Hal's comments about people putting down other programming languages. I started out in programming 16 years ago as a Fortran hacker, migrated to Pascal during my years as a graduate student and eventually learned C on my own. Along the way I picked up some Cobol, Lisp, PL/1, Snobol, and APL experience. All the above languages have their strengths and weaknesses. Some of the languages were invented before the field knew very much about compiler and programming language theory and their design reflects the level of knowledge at the time of their invention. Doubtless our knowledge in this area will continue to mature and future languages will be better than what we have now. Every programmer has a "favorite" language. My favorite right now is C and I appreciate its strengths, but I am not so blind as to ignore it's weaknesses. I suggest that before taking potshots at other languages that you investigate the issues involved further before making false or misleading statements. -- Smoot Carl-Mitchell, CS Dept. University of Texas at Austin {seismo, ctvax, ihnp4, kpno}!ut-sally!smoot, smoot@ut-sally.{ARPA, UUCP}
ron%brl-vgr@sri-unix.UUCP (03/12/84)
From: Ron Natalie <ron@brl-vgr> It's not PL/1 it's PL/I. (note the roman numeral).
nather@utastro.UUCP (Ed Nather) (03/16/84)
[] From: Travis Lee Winfrey <Us.Travis%cu20b@BRL-VGR.ARPA> I personally would kill for built-in type checking and array bounds checking during the debugging phase of my programs. They could be removed after the program reached production quality.... Travis ------- If I may paraphrase a point made in "Software Tools" by K & P (I can't find the exact wording): This is much like wearing a parachute while you are in pilot's training on the ground, then discarding it when you go aloft. If the checking is needed for debugging, it is almost certainly needed for what is sometimes called "malicious input." -- Ed Nather ihnp4!{ut-sally,kpno}!utastro!nather Astronomy Dept., U. of Texas, Austin
trb@masscomp.UUCP (03/16/84)
I beg His humblest pardon for contradicting a point made in "Software Tools," but while I agree that bounds checking might certainly be needed for detecting malicious input in programs, it might also add lots of overhead to the darkest guts of a program, where you need the speed and certainly don't need the checking. Like jumping out of a plane and making sure that the touchdown area is covered with pillows 100 feet deep. I'll take bounds checking in production code as long as I can turn it off in sections of code where I don't want it, using a compiler directive. I know that I could certainly use checking when I debug, and I often wish I had it, and I wish that other people had it when I have to chase down their bugs. Having to compile separate modules with and without checking wouldn't be convenient enough, I think. Andy Tannenbaum Masscomp Inc Westford MA (617) 692-6200 x274
tim@unc.UUCP (Tim Maroney) (03/17/84)
I disagree that it is a mistake to remove array bounds checking and such after debugging is complete. If you have run a program on general test data (including anomalous data) and never gotten the error, then you may fairly safely assume that the error will not happen on any input data. It would be a mistake to sacrifice run-time efficiency for a check on an impossible (or at least VERY improbable) error. Of course, you'd better make sure that your test data really is "general". -- Tim Maroney, The Censored Hacker mcnc!unc!tim (USENET), tim.unc@csnet-relay (ARPA) All opinions expressed herein are completely my own, so don't go assuming that anyone else at UNC feels the same way.
dave@utcsrgv.UUCP (Dave Sherman) (03/19/84)
The current (#5) issue of "UNIX Review" has an interesting article on "Safe C", a C compiler which purportedly protects against such errors as exceeding array bounds (at run-time). Dave Sherman Toronto -- {allegra,cornell,decvax,ihnp4,linus,utzoo}!utcsrgv!dave
leiby@masscomp.UUCP (03/20/84)
> Hal Perkins: > > I'm getting real tired of the C bigots on the net who take gratuitous > pot-shots at other programming languages; who have apparently seen "revealed > truth", and it's called C, and it's better than anything else because they > say so. Quite right. Everyone knows that BLISS-11 is the One True Programming Language. -- Mike Leibensperger {decvax,tektronix,harpo}!masscomp!leiby Masscomp; One Technology Park; Westford MA 01886
andyb@dartvax.UUCP (Andy Behrens) (03/20/84)
> I personally would kill for built-in type checking and array > bounds checking during the debugging phase of my programs. They could be > removed after the program reached production quality.... Dijkstra points out that it is foolish to just use bounds checking during testing (when you can recover from errors), only to remove the checking for production runs (when an error becomes a serious thing). -- Andy Behrens P.O. Box 24, East Thetford, Vermont UUCP: {decvax,linus,cornell,dalcs}!dartvax!andyb CSNET: andyb@dartmouth ARPA: andyb%dartmouth@csnet-relay
wls@astrovax.UUCP (William L. Sebok) (03/22/84)
>> I personally would kill for built-in type checking and array >> bounds checking during the debugging phase of my programs. They could be >> removed after the program reached production quality.... >Dijkstra points out that it is foolish to just use bounds checking during >testing (when you can recover from errors), only to remove the checking >for production runs (when an error becomes a serious thing). > Andy Behrens This is silly. A program that has to handle input from the outside world should do its own checking of the consistency of the input. This should remain during production runs. Other than that one should not have to crucify one's production run on the cross of built-in type checking and array bounds checking. This is the an example of too much of the modern computer theory of "Cpu speed be dammed", that has resulted in so many slow programs and operating systems. -- Bill Sebok Princeton University, Astrophysics {allegra,akgua,burl,cbosgd,decvax,ihnp4,kpno,princeton,vax135}!astrovax!wls
gwyn@brl-vgr.ARPA (Doug Gwyn ) (03/25/84)
Another point is that correct code benefits NOT AT ALL from run-time subcript range checking and other such aids. How about getting the code right in the first place instead of relying on run-time detection of errors to find your mistakes. (This assumes you are able to run an exhaustive set of test cases through your code anyway, which is a very poor assumption.) I recommend Myers' "The Art of Software Testing" to those who care about the subject. Input validation is a well-known requirement. Good COBOL programmers do this automatically. Is it possible that there are no good UNIX programmers? :-)