[net.lang.c] C strings suck!

tim (05/02/83)

The memory management of strings in C is really awful. It's usually
easy to restrict your strings to a certain space, and do an occassional
calloc if you need dynamic space. But for heavy-duty character string
manipulation programs, you might as well forget about it. sprintf and
such functions should return a pointer into data space instead of forcing
you to allocate your own buffers. That is both incredibly clumsy and
distracting, not to mention making your programs much more difficult
for anyone to understand. Finally, it leads to idiomatic programming,
which I'm sure we all agree is a Bad Thing.

The implementation, I'll admit, brings up some difficult issues, but
they don't seem insoluble. As far as garbage accumulation is concerned,
I see few problems with explicit freeing of store, with no reference
counting and no garbage collection (since C couldn't really cope
with either of them due to its loose pointer semantics). Of course, you
can blow it badly by freeing valid data, but then you can do that now.

Another upgrade of C strings would involve infix operators. There are
three main reasons why these would be good things to have. The first
is readability and writability of complex string expressions. Even a
devoted Lisp hacker like myself can't read the prefix form that is
forced on you now, if there are more than about five operations. The
second is efficiency. Many machines (including the VAX) have high-speed
character string operations. These are next to useless with C, because
you gotta call a function, with all the overhead that entails. String
operations should be done in-line on machines that support it, and the
only good way to do that in C is to add the operators. Finally, there's
that old debbil idiomatic programming again. When everyone writes their
own functions to do something, you gotta learn a new language every time
you read a new program.

The fact that C is supposed to be an assembler substitute doesn't mean
all high-level operations have to be such a pain.

Tim Maroney

mjl (05/05/83)

In refererence to Tim Maroney's complaints about C strings, in particular his
suggestion that strings be made a primitive type in the language: while we're
at it, why not add built-in I/O statements?  And, since the VAX has some nice
polynomial evaluation instructions, why not make polynomials a primitive as
well?  And I'm certain there are some engineers and scientists out there who
curse C because it doesn't directly support complex arithmetic.

All of this, of course, leads to a proliferation of language "features," and
soon we'd have C looking like PL/I (heaven forbid!!).  The sparseness of C,
coupled with its flexibility, is one of the strong points in the language.  C
certainly has its share of warts, but simply adding stuff on top without very
careful consideration of the consequences can lead up with something much
worse.  At the minimum, any significant change should be first evaluated using
a preprocessor that generates "real" C, so that there is a low cost prototype
available for experimentation.

As for the VAX string instructions, they can be accessed via assembly language
routines if that's important to you.  We've done this for the C string
library, and the net performance improvement was almost nil for the majority
of programs (not because of the overhead of the function call, but because the
string routines just don't contribute much to the overall execution time).

Mike Lutz
seismo!rochester!ritcv!mjl

smh (05/11/83)

The followup by Lutz points obliquely to what I have always considered
one of the most important and wondrous features of C, and one that is
almost universally missed by casual evaluators of the language.  (See,
for instance, Pournelle's comments on C in Byte several months back.)

If you look very carefully at the de facto C language "standard" ---
the language K&R book --- you will observe the surprising fact that
C has absolutely no definition of input-output or even the notion
of a "main" procedure!  All these things are provided by the object-
time environment.  The ubiquitous presence of the standard io library
or something like it in most programs and manual examples tends to hide
this.  Ever wondered why the grammar, or at least, the keyword list for
C is so brief?

Why is this so important?  Obviously, it makes porting the language to
a new machine much easier, especially since most or even all of the
standard io library can be written in C.  Still, this hardly matters
to most users.  My intuition is that this *brilliant* feat of language
design has allowed C the language to be much more plastic in the face
of changing system interface needs.  It seems a large portion of language
gripes (e.g. with FORTRAN, and especially Pascal) have to do with io.
A change to the way io is done in most languages requires a ponderous
change to language specifications and new versions of the compiler.
Compare how much easier (relatively) was the standard io library change
which occurred between Unix 6 & 7.  Even today, useful new features
creep into the standard io library without much ado -- also without
much documentation, but let that pass!  Of course, the strategy of
packaging io inside called functions is available to other languages,
but often seems a "crock" and contrary to the "spirit" of those
languages.

Consider how easy it is, for example, to write C code to download to
stand-alone (non-Unix) processors.  I have done so using the standard
compiler, the standard ld, and even as much of the standard io library
as made sense on the target machine, without particular crockiness.
The only special code which had to be written was a modified lib/crt0.o
to setup a C stack environment, and the innermost io routines to poke
at device registers (putchar, for example).  The latter, of course,
were written in C!  I suspect the project would have been much more
involved using a language carrying its own idea of io, but perhaps I
am wrong...

There must be other languages out there wth this "feature", but I
can't think of one offhand.  I will point out in passing how analagous
this aspect of C is to the PDP11 processor, which also lacks any
io instructions.  I wonder if there is any conscious or unconscious
causal relation.

Sorry about the length of this followup.  Any further remarks should
probably be posted to net.lang or net.lang.c .

					Steve Haflich
					MIT Experimental Music Studio
					...!genrad!mit-eddie!smh

trb (05/11/83)

Relay-Version:version B 3/9/83; site harpo.UUCP
Message-ID:<1508@floyd.UUCP>
Date:Wed, 11-May-83 15:55:50 EDT

Yes, the "C Standard I/O Library" is ubiquitous, but realize, young C
hackers, that stdio as it exists in 4.1bsd and System III was not the
first attempt at a C I/O system.  I don't know about the first
attempt, but I do know that the predecessor to stdio, the C "Portable
I/O library" was quite painful to use.

When the stdio library came out, all you had to do was search libS.a
(it was eventually incorporated into libc.a).  It was a whole lot
simpler than deciding what would be in Fortran 77.

My main point here is that the people who designed the stdio library
as it now stands learned from their mistakes, and weren't handcuffed
to their mistakes.

What a piece of work is C...

	Andy Tannenbaum   Bell Labs  Whippany, NJ   (201) 386-6491

lepreau (05/12/83)

Relay-Version:version B 3/9/83; site harpo.UUCP
Message-ID:<1580@utah-cs.UUCP>
Date:Wed, 11-May-83 18:02:25 EDT

The classic Algol 60 also doesn't have any integral i/o.

karn (05/12/83)

I have to strongly second the comments about keeping I/O out of the
C language.  This was indeed a stroke of genius which contributed
greatly to the portability of the language.

Even "simple" extensions cause tidal waves when you consider all of the
many different compiler implementations (under control of many different
vendors) that would have to be changed to maintain complete portability.
Example: how many of you use enumerated data types and structure passing
on a regular basis?  I had a brief fling with those features when they
first came out, but after getting burned by both (portability across
compiler releases with enums, portability and performance with structure
passing) I've given them up; they're not worth it.  And those are SIMPLE
changes!

If you're hopelessly in love with certain non-C language features, then I
would strongly recommend the pre-processor approach.  At least in theory
you can mix and match the preprocessor with your favorite machine
and compiler.

Phil

swatt (05/13/83)

This is an oft-heard complaint about C.  If you really require dynamic
strings supported directly in the language, pick another language.
Otherwise make the best of it you can with an appropriate library
package.  I just perused my old USENET net.sources archive and came
up with:

	From sdcarl!rusty Wed Jun  9 22:31:20 1982
	Subject: charb
	Newsgroups: net.sources

		====> charb.3 <====

		.TH CHARB 3
		.SH NAME
		cballoc, cbrealloc, fgetcb, makcb, cbbuf, cbmax, cbcat,
		cbncat, cbcpy, cbncpy \- operations for variable length strings
		.SH SYNOPSIS
	
	<and so on ...>

I'm sure there are several out there besides this one.

	- Alan S. Watt
	ittvax!swatt

geo (05/15/83)

I am following up a followup here, as the original article,
"C strings suck!" doesn't seem to have made it to our site.
I don't know quite what the objectors objections were to C
strings, but last June rusty@sdcarl submitted a bunch of functions to 
net.sources that allowed one to handle strings in a PL/1ish
manner.  It was interesting, you might want to take a look at it.
	Cordially, Geo Swan, Integrated Studies, University of Waterloo
	(allegra|decvax) !watmath!watarts!geo