[comp.std.c] The \c escape

karl@haddock.ISC.COM (Karl Heuer) (06/17/88)

Enclosed is the text of a proposal I sent in for the second public review.  I
have yet to receive the official reply, but I hear it's been rejected on the
grounds of limited utility.  I'd like to solicit further opinions before I
write up a rebuttal.

Don't bother to argue whether the correct name should be `\c' or `\z' or `\ ';
the question is whether the feature should exist at all.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
________

Proposal #1

Add new escape sequence \c.

Summary

This proposal cleans up two warts in the language: initializing a character
array without adding a null character, and terminating a hexadecimal escape
which might be followed by a valid hexadecimal digit.  It also allows the user
to explicitly document when a null character is unnecessary, e.g.
write(1,"\n\c",1).

Justification

I presume the Committee is already aware of the need for non-null-terminated
character arrays, since the January Draft makes a special case for them in
3.5.7.  However, the mechanism requires the user to count the characters
himself in order to make sure that he doesn't leave room for the null
characters; this is a maintenance nightmare.  My proposal is a cleaner way to
accomplish this.

It has been suggested that although an escape to suppress the null character
is useful, the termination of hex escapes is not an issue because it is
handled by string literal pasting.

String pasting is useful for line continuation without backslash-newline, and
for constructing string literals in macros, but using it to indicate the end
of a hex escape is a botch.  This is nearly as bad as suggesting that the
whole string be written in hex.

Moreover, it's very C-specific; one could not advertise a program that
`accepts all the C escapes' as input, without first solving the hex-
termination problem all over again.

Also, it doesn't handle character constants.  The example in 3.1.3.4 is
clearly a kludge--it suggests replacing the hex escape with octal.  This won't
always be possible on an architecture with 12-bit bytes, for example.

Finally, if the \c escape is added anyway for the null-suppression feature,
the additional change of insisting that it be a no-op in other contexts is
minor.

Specific changes

In 3.1.3.4, page 29, line 10, add \c to the list of escapes.  Add the
description: `The \c escape at the end of a string literal suppresses the
trailing null character that would normally be appended.  If \c appears in a
character constant, or anywhere in a string literal other than at the end,
then it is ignored, but may serve to separate an octal or hexadecimal escape
from a following digit.'

In 3.1.3.4, page 30, line 35, change '\0223' to '\x12\c3'.

In 3.1.4, page 31, line 29, after `A null character is then appended' add
`unless the string literal ended with \c'.  Make a similar change to line 31.
Add the sentence `If a character string literal or a wide string literal has
zero length, the behavior is undefined'.  Add to footnote 16 the text `or it
may lack a trailing null character because of \c'.

In 3.1.4, page 31, line 41, add `This string may also be denoted by
"\x12\c3"'.

In 3.5.7, page 73, line 23, replace `if there is room or if the array is of
unknown size' with `if it has one'.  (The ability to initialize a non-null-
terminated array without using \c may be listed as a Common Extension.)

papowell@attila.uucp (Patrick Powell) (06/18/88)

In article <4604@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
>Add new escape sequence \c.
>
>Summary
>
>This proposal cleans up two warts in the language: initializing a character
>array without adding a null character, and terminating a hexadecimal escape
>which might be followed by a valid hexadecimal digit.  It also allows the user
>to explicitly document when a null character is unnecessary, e.g.
>write(1,"\n\c",1).

This is FAR too reasonable, clean, clear, simple, trivial to implement,
does not break existing systems, etc., to even make it out the door.

I think that you should aim for something more cryptic...  difficult to
word...  you know,  sort of "klugdy"...  perhaps tied in with the preprocessor.
And needing a keyword.

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/19/88)

In article <5907@umn-cs.cs.umn.edu>, papowell@attila.uucp (Patrick Powell) writes:
> In article <4604@haddock.ISC.COM> karl@haddock.isc.com (Karl Heuer) writes:
> >Add new escape sequence \c.
> This is FAR too reasonable, clean, clear, simple, trivial to implement,
> does not break existing systems, etc., to even make it out the door.

Don't be such an ass.  X3J11 is not in the business of adding features
to C just because they fit.  For a proposal such as \c to be adopted,
it would be necessary to show why it is important enough to justify
making the language fatter.  Some good examples of a significant
problem that \c addresses would have helped the odds of its adoption
considerably.  Now it is too late to be mucking around except to
remedy serious technical errors.  Does \c do that?

karl@haddock.ISC.COM (Karl Heuer) (06/20/88)

In article <8125@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>For a proposal such as \c to be adopted, it would be necessary to show why it
>is important enough to justify making the language fatter.

Given that it removes the need for another feature of the Standard (the bit
about strings not being nul-terminated in one special case), it's not obvious
that it does make the language fatter.

>Some good examples of a significant problem that \c addresses would have
>helped the odds of its adoption considerably.

Personally, I think that extending hex escapes from three digits to infinity
created a problem more significant than the one it solved, but I seem to be
having trouble convincing the Committee.  The only new argument I've come up
with is that the printf formats %#c and %#s, though rejected by X3J11, may
become Common Extensions; if these go into the next Standard, the problem of
hex termination have to be faced anyway.  (Of course, \c can be added as a
Common Extension by those same implementations, so maybe this is the Way.
Trouble is, one is a library extension, the other a compiler extension.)

>Now it is too late to be mucking around except to remedy serious technical
>errors.  Does \c do that?

I honestly don't know.  I don't have much hope for its acceptance, given the
timing (that's why I rushed to submit it in the previous Review), but I'll
give it a try.  And if anyone has better examples of What It's Good For (note
that some of mine were hidden in the `Specific Changes' section), I'd be glad
to incorporate them.

Does the nonexistence of a spelling for the character constant '\x234\c5'
constitute a serious technical error?  If not, why does the Standard bother to
mention that '\x12\c3' can be spelled '\0223'?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

ok@quintus.uucp (Richard A. O'Keefe) (06/20/88)

In article <8125@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>  Now it is too late to be mucking around except to
>remedy serious technical errors.  Does \c do that?

Well, \x is technical, serious, and IMHO an error.  \z (*please* not \c)
is a minimal remedy for that botch.

plumbc@admin.ogc.edu (Colin Plumb) (06/20/88)

I'd like to endorse this proposal.  It has practical value, it breaks
nothing, and is not terribly hard to implement.

This would also be useful in long character constants, where I don't
think string pasting is allowed.

I would prefer '\ ' to '\c', since a non-alphanumeric character better
indicates the magicness of the escape, and ' ' is as close to the
"nothing" meaning of the escape as printable ASCII can get.

Again, it is a useful, trivial change.  I shall certainly hack it
into the next C compiler I get.
-- 
	-Colin (plumbc@admin.ogc.edu, posting from a friend's account)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/20/88)

In article <4621@haddock.ISC.COM> karl@haddock.ima.isc.com (Karl Heuer) writes:
>Does the nonexistence of a spelling for the character constant '\x234\c5'
>constitute a serious technical error?

Probably not.  Multiple-character character constants are unportable and
considered bad style.  They're only permitted because they were in previous
practice.  Many of us don't think you should ever try to make use of them.

>If not, why does the Standard bother to mention that '\x12\c3' can be
>spelled '\0223'?

Possibly to forestall questions.

maart@cs.vu.nl (Maarten Litmaath) (06/21/88)

Great solution!
-- 
South-Africa:                         |Maarten Litmaath @ Free U Amsterdam:
           revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart

peter@ficc.UUCP (Peter da Silva) (06/21/88)

In article <8127@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> In article ... karl@haddock.ima.isc.com (Karl Heuer) writes:
> >the nonexistence of a spelling for the character constant '\x234\c5'...

> Probably not.  Multiple-character character constants are unportable and
> considered bad style.  They're only permitted because they were in previous
> practice.  Many of us don't think you should ever try to make use of them.

(1) OK, how do you spell "\x234\c5"? "\x234""5"? Bleagh.

(2) What's wrong with a readable and maintainable way of initialising a
    32-bit unsigned integer to 0x464F524DL? You prefer that (or ('F'<<24)|
    ('O'<<16)|('R'<<8)|('M')) to 'FORM'?
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: hoptoad!academ!uhnix1!sugar!peter.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/22/88)

In article <963@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>In article <8127@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>> Multiple-character character constants are unportable and
>> considered bad style.  They're only permitted because they were in previous
>> practice.  Many of us don't think you should ever try to make use of them.

>(1) OK, how do you spell "\x234\c5"? "\x234""5"?

Sure.  What's wrong with using the more general facility for strings?

>(2) What's wrong with a readable and maintainable way of initialising a
>    32-bit unsigned integer to 0x464F524DL? You prefer that (or ('F'<<24)|
>    ('O'<<16)|('R'<<8)|('M')) to 'FORM'?

I would prefer that you not depend on being able to jam 'FORM' into
an int in the first place.  It will obviously not work on a 16-bit
implementation.

If you nonetheless feel you have to use a kludge a la troff, try
	#define	PACK(c1,c2)	((c1)<<8 | (c2))
which at least is portable across 8-bit byte implementations.

blarson@skat.usc.edu (Bob Larson) (06/23/88)

In article <963@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>(2) What's wrong with a readable and maintainable way of initialising a
>    32-bit unsigned integer to 0x464F524DL?
0x464F524DL
> You prefer that (or ('F'<<24)| ('O'<<16)|('R'<<8)|('M')) 
> to 'FORM'?

No, I use what I mean.  Are you trying to imply that the three
things you mention have the same value?  (I grant that on some
machines with some compilers they might.  I'm sure on other machines
and compilers they are different.)

Multi-character constants are "implemntation defined" (K&R2 page 193)
thus are for portable code UNDEFINED.

-- 
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%ais1@ecla.usc.edu
			oberon!ais1!info-prime-request

daniels@teklds.TEK.COM (Scott Daniels) (06/23/88)

In article <1719@ogcvax.ogc.edu> plumbc@admin.ogc.edu (Colin Plumb) writes:
>I'd like to endorse this proposal.  It has practical value, it breaks
>nothing, and is not terribly hard to implement...
>I would prefer '\ ' to '\c', since a non-alphanumeric character better
>indicates the magicness of the escape, and ' ' is as close to the
>"nothing" meaning of the escape as printable ASCII can get.
This is a dangerous substitute, since it is impossible to look at a source
file and see whether the whitespace there is a tab or a space.  If you
must avoid a letter (I really don't see why, every \* thing is "magic"),
why not use '\_', which has the emptiness property, but is at least visually
distinguishable.  Also, some printing systems provide so little area for
a space, it is not immediately clear whether '\n' and "\ n" are different.

-Scott Daniels		daniels@teklds.TEK.COM or daniels@teklds.UUCP

karl@haddock.ISC.COM (Karl Heuer) (06/25/88)

In article <313@sdrc.UUCP> scjones@sdrc.UUCP (Larry Jones) writes:
>The problem is that if we make any substantive changes (i.e. anything but
>editorial corrections), we are REQUIRED by ANSI rules to have another 2
>month public review which would delay the final standard by about 6 months.

This is why I suspect that my \c only has a chance if there's some *other*
substantive change in the third review.  But (since it failed the second
review, despite the absence of such a delta-cost) even if this happens, it
still needs further support.  Most of what I've heard so far is "Yes! That's a
good idea!"; what I need is something that will convince X3J11 that the lack
of this functionality is a serious technical flaw.

And quit arguing about what it should be called.  The Committee can spell it
any way they want for all I care.  I chose \c because the suppress-terminator
feature is similar to \c in USG echo.  If the two uses are to have separate
spellings (which is how I originally conceived it), I'd go with \c and \z.  Or
\c and \x(NNN).

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

aegl@root.co.uk (Tony Luck) (06/29/88)

In article <963@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>
>(2) What's wrong with a readable and maintainable way of initialising a
>    32-bit unsigned integer to 0x464F524DL? You prefer that (or ('F'<<24)|
>    ('O'<<16)|('R'<<8)|('M')) to 'FORM'?

But (as many people have already tried to point out) multi-character constants
aren't portable .... 'FORM' isn't necessarily 0x464F524DL e.g. on my machine
here (68030 based, i.e. big-endian) the following program:

	main() { printf("0x%x\n", 'FORM'); }

produces as output:

	0x4d524f46

Which in these days of portable data interchange, and networked systems is
likely to mess up your whole day.

Tony Luck