[comp.lang.c] newlines in string constants

kyle@xanth.UUCP (Kyle Jones) (10/07/87)

I would like to be able to have multi-line string constants without
having to put \n\ at the end of each line.  For example

char *s = "This is\n\
legal.";

whereas

char *s = "This is
not legal.";

I would like to see the second form become legal.  What does the
current ANSI draft have to say about this?

kyle jones  <kyle@odu.edu>  old dominion university, norfolk, va  usa

meissner@dg-rtp.UUCP (Michael Meissner) (10/07/87)

In article <2669@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
| I would like to be able to have multi-line string constants without
| having to put \n\ at the end of each line.  For example
| 
| char *s = "This is\n\
| legal.";
| 
| whereas
| 
| char *s = "This is
| not legal.";
| 
| I would like to see the second form become legal.  What does the
| current ANSI draft have to say about this?

The current ANSI draft still says this is illegal.  However, it does add a
feature where adjacent strings are pasted together.  For example, you could
write:

	char *s = "This is\n"
		  "legal.";
-- 
Michael Meissner, Data General.		Uucp: ...!mcnc!rti!xyzzy!meissner
					Arpa/Csnet:  meissner@dg-rtp.DG.COM

minow@decvax.UUCP (Martin Minow) (10/08/87)

In article <2669@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) would like
to have multi-line string constants without the annoying \n\ at the
end of each line.
The problem with allowing multi-line string constants without \<nl> tags
is that a missing terminating quote is discovered only when the compiler
falls off the end of the file (or trips over another string).  If you
have more strings, you are in the amusing situation of compiling strings
and "stringing" code.  If you want to include a massive amount of text,
you can write a simple pre-processor such as the following (untested).
I used something similar to build a 10Kbyte string for an application.
This caused interesting hiccoughs in the compiler.

#include <stdio.h>
#define FALSE	0
#define TRUE	1

main() {
	int		nl_pending = FALSE;
	int		c;

	putchar('"');
	while ((c = getchar()) != EOF) {
	    if (nl_pending) {
		putchar('\\');		/* \n			*/
		putchar('n');
		putchar('\\');		/* \ to continue string	*/
		putchar('\n');		/* end of line		*/
		nl_pending = FALSE;
	    }
	    if (c == '\n')
		nl_pending = TRUE;
	    else {
		putchar(c);
	    }
	}
	putchar('"');
	putchar('\n');
}

Note, by the way, that Ansi C lets you write long strings as
	char foo[] =	"abc"
			"def"
			"ghi";
You still have to add the \n explicitly.

Martin Minow
decvax!minow

karl@haddock.ISC.COM (Karl Heuer) (10/08/87)

In article <2669@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
>char *s = "This is
>not legal.";

I doubt that ANSI will bless this.  It makes it harder for the compiler to
resynchronize after a syntax error, and it duplicates existing functionality.
(Of course, a compiler is free to implement this as an extension.  I think the
Gnu cc might allow this notation.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

gwyn@brl-smoke.ARPA (Doug Gwyn ) (10/08/87)

In article <2669@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
>char *s = "This is\n\
>legal.";

Still legal under ANSI C; the escaped new-line is removed from the
source character stream very early in the translation process.

>char *s = "This is
>not legal.";

Still not legal under ANSI C; the stuff after "= " on the first line
does not constitutre a valid preprocessing-token.

There is yet another way to do what you want under ANSI C, using the
new feature of concatenation of adjacent string literals:
	char *s = "This is\n"
		"legal too.";
Ths has the advantage that you don't have to start the word "legal"
on the left margin.

rwhite@nusdhub.UUCP (Robert C. White Jr.) (10/09/87)

In article <6527@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> In article <2669@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
> >char *s = "This is\n\
> >legal.";
> 
> 	char *s = "This is\n"
> 		"legal too.";

	It is a much more effective [if the strings are short]
to simply write:

	char *s = "This is\nQuite legal.\nIf a little ugly."

Robert ("Who Asked You Anyway?") White.

feg@clyde.ATT.COM (Forrest Gehrke) (10/14/87)

In article <2669@xanth.UUCP>, kyle@xanth.UUCP (Kyle Jones) writes:
> I would like to be able to have multi-line string constants without
> having to put \n\ at the end of each line.  For example
> 
> char *s = "This is\n\
> legal.";
> 
> whereas
> 
> char *s = "This is
> not legal.";
> 
> I would like to see the second form become legal.  What does the
> current ANSI draft have to say about this?
> 

How is the compiler going to divine the number (if any) of spaces you
will want?  The intervening spaces might have been intentional.

Forrest Gehrke

kyle@xanth.UUCP (Kyle Jones) (10/17/87)

In article <2669@xanth.UUCP>, I state that I would like to see
multi-line strings like:

char *s = "This is
not legal.";

to become legal under ANSI C.

In article <15262@clyde.ATT.COM>, feg@clyde.ATT.COM (Forrest Gehrke) writes:
> How is the compiler going to divine the number (if any) of spaces you
> will want?  The intervening spaces might have been intentional.

Exactly.  I want the compiler to take everything that appears between
the double quotes literally, with the exception of the usual backslash
escapes.  My text editor knows how many spaces there are after "is", so
the compiler certainly shouldn't have any problem grasping this.

All the responses I received said that multi-line strings are still
illegal under ANSI C but pointed out that strings separated by only by
whitespace will be concatenated.

Furthermore, the responses I received indicated that literal newlines
in strings are disallowed so the compiler can discover unterminated
strings early and not see the rest of the program "inside-out".  There
has GOT to be a better explanation than that.  Yes, the compiler would
lose its mind if you forgot a " but not any more so than when you forget
a brace or put a semicolon after a function definition.

I can write a pre-processer that allows me to have my multi-line
strings, but I would like to believe there's a better reason than the
one above for not allowing them in the language.

kyle jones  <kyle@odu.edu>  old dominion university, norfolk, va  usa

chris@mimsy.UUCP (Chris Torek) (10/17/87)

In article <2810@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones), wanting
something like

>char *s = "This is
>not legal.";

to be declared legal, answers Forrest Gehrke:

>In article <15262@clyde.ATT.COM> feg@clyde.ATT.COM (Forrest Gehrke) writes:
>>How is the compiler going to divine the number (if any) of spaces you
>>will want?  The intervening spaces might have been intentional.

>Exactly.  I want the compiler to take everything that appears between
>the double quotes literally, with the exception of the usual backslash
>escapes.  My text editor knows how many spaces there are after "is", so
>the compiler certainly shouldn't have any problem grasping this.

Yep.  There are exactly 62 spaces there, of course.  What?  Did
you say you typed only one?  But surely you can see all the others
on that card.  Card?  Sorry, I should have said `card image'.

Oh, you are not using an IBM machine?

Remember that there *are* machines out there that work with fixed
length records, even for source code.  Some of them even have C compilers.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

henry@utzoo.UUCP (Henry Spencer) (10/19/87)

> ... the responses I received indicated that literal newlines
> in strings are disallowed so the compiler can discover unterminated
> strings early and not see the rest of the program "inside-out".  There
> has GOT to be a better explanation than that.  Yes, the compiler would
> lose its mind if you forgot a " but not any more so than when you forget
> a brace or put a semicolon after a function definition.

No, the string problem is worse, because it's at the lexical level where
intelligent error recovery is harder.  Your program may even look more or
less legal inside-out!

> I can write a pre-processer that allows me to have my multi-line
> strings, but I would like to believe there's a better reason than the
> one above for not allowing them in the language.

How about "it's not C"?  Or, more specifically, to use the sort of wording
that X3J11 would use when rejecting such a pointless triviality:  "new
feature; no operational experience with it; need not convincingly shown;
same effect possible with existing features".
-- 
"Mir" means "peace", as in           |  Henry Spencer @ U of Toronto Zoology
"the war is over; we've won".        | {allegra,ihnp4,decvax,utai}!utzoo!henry

finegan@uccba.UUCP (Mike Finegan) (10/21/87)

Does anyone have a macro, or other way, to ignore a newline while in
a string (particularly an argument to [s,f]printf ? Something like :
#define swallow(!A!)

~
~
much code
~
~
				printf("I only want this swallow(!
					and not this !) to be in the string");

Any ideas short of my own pre-processor ? If it's insultingly simple, great!

						Mike Finegan
		      ...!(hal,decuec,mit-eddie,pyramid)!uccba[!ucece1]!finegan

franka@mmintl.UUCP (Frank Adams) (10/21/87)

In article <2810@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
>I can write a pre-processer that allows me to have my multi-line
>strings, but I would like to believe there's a better reason than the
>one above for not allowing them in the language.

How about the fact that they look ugly?

I am quite happy that C does not allow this.  I don't expect everybody to
agree with me, but I don't expect to change my opinion, either.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

chip@ateng.UUCP (Chip Salzenberg) (10/22/87)

In article <2810@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
>In article <2669@xanth.UUCP>, I state that I would like to see
>multi-line strings like:
>
>char *s = "This is
>not legal.";
>
>[...] I want the compiler to take everything that appears between
>the double quotes literally, with the exception of the usual backslash
>escapes.  My text editor knows how many spaces there are after "is", so
>the compiler certainly shouldn't have any problem grasping this.

But this is not true of all text editors.  The construct you propose is
just as troublesome as imbedding real tab characters in quoted strings.
Some text editors translate tabs to spaces, strip trailing spaces, let you
edit, and translate leading spaces to tabs.

Even if you say that this kind of editor should be banished, there are
still all those mailers out there that strip trailing white space.  Don't
you want your source file to be mailable after you shar it?

-- 
Chip Salzenberg         "chip@ateng.UUCP"  or  "{uunet,usfvax2}!ateng!chip"
A.T. Engineering        My employer's opinions are not mine, but these are.
   "Gentlemen, your work today has been outstanding.  I intend to recommend
   you all for promotion -- in whatever fleet we end up serving."   - JTK

karl@haddock.ISC.COM (Karl Heuer) (10/23/87)

In article <2502@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>In article <2810@xanth.UUCP> kyle@xanth.UUCP (Kyle Jones) writes:
>>I can write a pre-processer that allows me to have my multi-line
>>strings [without "\n\" continuation], but I would like to believe there's a
>>better reason than the one above for not allowing them in the language.
>
>How about the fact that they look ugly?

(That's not a fact, that's an opinion.)

I think a paragraph enclosed in quotes is *less* ugly than the same thing with
"\n\" at the end of each line, which is the least ugly way to write it now.

The idea about trailing blanks is less problematic than having embedded tabs
in a source file (which *is* legal).  The careful programmer would have to
assume that trailing blanks may be stripped, and explicitly use "\n\" if they
need to be retained.

To answer the original question: I would give you good odds that the reason
given by X3J11 would be "lack of prior art; can be done with existing
features; need not convincingly demonstrated".  Basically, ANSI C doesn't
allow it because K&R doesn't, and gratuitous improvements are outside their
charter.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint