[comp.lang.c] Null-terminated C strings

roy@phri.UUCP (Roy Smith) (12/24/87)

In article <2447@hall.cray.com> blu@hall.UUCP (Brian Utterback) writes:
> At least the compiler should issue a warning if it eats a null.  I mean,
> what is the use of being able to specify a character in a string (i.e.
> \000) if the compiler won't really use it?

	Interesting question.  Should a \000 in a string constant be
flagged as a warning by the compiler?  On both my 4.3 Vax and 3.2 Suns, the
following program draws no complaint:

main ()
{
	printf ("this\000probably won't print\n");
}

	Even lint has nothing to say (other than complaining about printf's
return value being ignored on the Sun).  Surely at least lint should pick
up this one.  There is nothing strictly illegal about imbedding a null in a
string constant, but it is strange.  One might want to do:

	printf ("this\000probably won't print\n"+5);

and the compiler should let you do it, although I can't think of any valid
reason offhand you would *want* to do such a thing.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

andrew@frip.gwd.tek.com (Andrew Klossner) (12/25/87)

[]

	"Should a \000 in a string constant be flagged as a warning by
	the compiler? ... I can't think of any valid reason offhand you
	would *want* to do such a thing."

Character arrays have uses other than printable strings.  The fact that
\000 and eight-bit characters like \377 can be spelled in C leads to
some useful coding shortcuts.  As a (stupid) example:

	int isPrime(number) int number; {
		if (number<0 || number>10) {
			fprintf(stderr,"isPrime(%d): out of range\n",number);
			abort();
			}
		return "\0\1\1\1\0\1\0\1\0\0\0"[number];
		}

(I have made non-stupid use of this feature, but in programs too long to
use as examples here.)

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

leech@polk.cs.unc.edu (Jonathan Leech) (12/25/87)

Expires:

Sender:

Followup-To:

Distribution:

Keywords:


In article <3087@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>There is nothing strictly illegal about imbedding a null in a
>string constant, but it is strange.  One might want to do:
>
>	printf ("this\000probably won't print\n"+5);
>
>and the compiler should let you do it, although I can't think of any valid
>reason offhand you would *want* to do such a thing.

    To let people patch site-specific data in code distributed
binary-only e.g.

    static char *termtypes[] = {
	"ADM3A\0Patching area",     /* VMS Foreign Terminal type FT1 */
	...etc

    something like this showed up in code I saw a few years ago.
    Jon Leech (leech@cs.unc.edu)    __@/
    ``...there are many places where gravity has its practical
      applications as far as the Universe is concerned.''
	- R. P. Feynman, _The Character of Physical Law_

daveb@geac.UUCP (David Collier-Brown) (12/28/87)

In article <152@piring.cwi.nl> jack@cwi.nl (Jack Jansen) writes:
| Even though I favour null-terminated strings myself...
[enumeration of booby-traps that catch C programmers]

| Moreover, the approach where the count is kept with the pointer
| has another *big* advantage: it unifies strings with other variable
| dimension arrays. Again, this is not a point that I feel very strong
| about myself (I don't think I *ever* inverted a matrix), but 
| it *is* rather stupid that you have to specify the dimensions of
| a matrix in a call some routine while the compiler knows those
| dimensions already....

  Agreed, but we're drifting away from the architecture question:
should we provide facilities for supporting array operations (search
string/array for terminator) or array descriptors (assign dope-vector 
slice n..m of array x).
  This question can become arbitrarily complex... and is herewith
cross-posted.
  In the meantime, I try to use C as a language to code **into** and
not **in**, so I can often (but not always) back out of
locally-unwise assumptions of the language designers.

 --dave (architecture X programming languages) c-b

ps: for people reading this in comp.lang.c, the discussion of the
strengths and weaknesses of lengths and terminating characters came
out of a discussion of what to put in hardware.  It is worth
reviewing, as this is **not** the religious discussion it might
appear at first glance.

franka@mmintl.UUCP (01/09/88)

[I have directed followups to comp.lang.c only]

I believe it was a mistake in C to make string constants refer to null
terminated strings automatically.  Better, I think, to make the constant
contain exactly what the programmer wrote.  That way, other string-
representation schemes could be used easily and conveniently.  As it is,
null-terminated strings are more convenient than they would be if it were
done this way, but anything else is much uglier.

It is, of course, much too late to change the language in this way.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (01/13/88)

In article <2637@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>[I have directed followups to comp.lang.c only]
>
>I believe it was a mistake in C to make string constants refer to null
>terminated strings automatically.  Better, I think, to make the constant
>contain exactly what the programmer wrote.  That way, other string-

It is easy enough to do what (I think) you want by using a use defined
type and writing your own procedures.




-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

dsill@NSWC-OAS.arpa (Dave Sill) (01/20/88)

In article <2294@haddock.ISC.COM> Karl Heuer <karl@haddock.ISC.COM> writes:
>|"The \c escape at the end of a string literal suppresses the trailing NUL.
>|If it appears other than at the end, it is ignored, but may serve to separate
>|an octal or hex escape from a following digit."  [Proposed wording]
>
>So, what do y'all think?  Is this a good idea?  Should the two features both
>exist but have different notations?  (That was my original plan, with "\c" to
>suppress the NUL and "\z" to represent a zero-width separator.)

I like \c for suppressing the trailing *null character* (NUL implies
ASCII), but I don't think a zero-width separator is needed because the
automatic concatenation of adjacent string literals added by X3J11 can
perform the same function.

=========
The opinions expressed above are mine.

"I shed, therefore, I am."
					-- ALF

Alan_T._Cote.OsbuSouth@Xerox.COM (01/26/88)

In article <2294@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>|"The \c escape at the end of a string literal suppresses the trailing NUL.
>|If it appears other than at the end, it is ignored, but may serve to separate
>|an octal or hex escape from a following digit."  [Proposed wording]
>
>So, what do y'all think?  Is this a good idea?  Should the two features both
>exist but have different notations?  (That was my original plan, with "\c" to
>suppress the NUL and "\z" to represent a zero-width separator.)

The idea of an escape to suppress the trailing NUL is, I think, a good one.

However, I would recommend a different escape to represent a zero-width
separator, as follows:  Use the string concatenation facility!  The sequence, ""
should suffice to do the job.  For example, ...

EX#1:		char *a="\0101\z3";		/* using Karl's idea */
EX#2:		char *a="\0101""3";	/* my suggestion */

Wouldn't EX#2 accomplish what you wanted EX#1 to accomplish?