[comp.lang.c] "%#s"?

jk3k+@andrew.cmu.edu (Joe Keane) (05/28/88)

It would be nice for printf et al.  to have a %#s format specifier,
to convert unprintable characters into backslash escapes.  Thus
	printf ("%#s", "\b\fhi\n\207");
would be equivalent to
	printf ("\\b\\fhi\\n\\207");
Has anyone else thought of this?  Is it a good idea?  Can it be put
in the standard?

--Joe

chris@mimsy.UUCP (Chris Torek) (05/29/88)

In article <AWbStey00Uk4E0S14d@andrew.cmu.edu> jk3k+@andrew.cmu.edu
(Joe Keane) writes:
>It would be nice for printf et al.  to have a %#s format specifier,
>to convert unprintable characters into backslash escapes.  Thus
>	printf ("%#s", "\b\fhi\n\207");
>would be equivalent to
>	printf ("\\b\\fhi\\n\\207");

There are many things that might be nice: Roman numerals (both upper
and lower case: %r and %R are both free), unsigned decimal conversions,
chopped vs. rounded (and up or down?) floating numbers, and so forth.
But some of us feel that printf is already too complex.  (Keith Bostic
and I spent a week arguing over exactly what printf is supposed to do
in odd cases, poring over ANSI drafts, and coming up with perverse test
cases like `%500.400g' or `%#0500x'.  The 4.3BSD _doprnt fails the
huge-fill-count tests, incidentally.)

>Has anyone else thought of this?

Yes.

>Is it a good idea?

Possibly.  You can always your own code for it.  (Probably the best way
to augment printf, if it is to be augmented at all, is to add a `%'
conversion that calls a function, passing it the various flags and
specifiers and the FILE and va_list arguments.  The function would
return the number of characters transferred.  At least, that would fit
well with our new _doprnt.)

>Can it be put in the standard?

Anything that is not an `editorial change' is probably not going
to make it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

henry@utzoo.uucp (Henry Spencer) (05/29/88)

> It would be nice for printf et al.  to have a %#s format specifier,
> to convert unprintable characters into backslash escapes...
> Has anyone else thought of this?  Is it a good idea?  Can it be put
> in the standard?

Yes.  Probably.  No, it's much too late now.

I came up with this a couple of years ago, although I was more interested
in the input side, i.e. scanf.  I did not end up making a formal proposal
to X3J11 because I am a firm believer (firmer than X3J11!) in the "no
standard without implementation experience" rule, and couldn't point to
any real experience.  Possibly I should have tried; it WOULD be useful.

It is too late to make significant additions to the first C standard now;
the committee very badly wants to get the damn thing out the door, and
will be very unreceptive to any substantive (what's in the language, as
opposed to how it's described) changes unless the situation is clearly a
major emergency.
-- 
"For perfect safety... sit on a fence|  Henry Spencer @ U of Toronto Zoology
and watch the birds." --Wilbur Wright| {ihnp4,decvax,uunet!mnetor}!utzoo!henry

rbutterworth@watmath.waterloo.edu (Ray Butterworth) (05/30/88)

In article <1988May28.222450.2680@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> > It would be nice for printf et al.  to have a %#s format specifier,
> > to convert unprintable characters into backslash escapes...
> > Has anyone else thought of this?  Is it a good idea?  Can it be put
> > in the standard?
> 
> Yes.  Probably.  No, it's much too late now.
> 
> I came up with this a couple of years ago, although I was more interested
> in the input side, i.e. scanf.  I did not end up making a formal proposal
> to X3J11 because I am a firm believer (firmer than X3J11!) in the "no
> standard without implementation experience" rule, and couldn't point to
> any real experience.  Possibly I should have tried; it WOULD be useful.

We added %#s and %#c to our version of printf a while back,
and it was very useful, especially for error messages. e.g.
1) warning("Illegal character %#x ignored", c);
2) warning("Illegal character '%c' ignored", c);
3) if (isascii(c) && isprint(c))
       warning("Illegal character '%c' ignored", c);
   else
       warning("Illegal character %#x ignored", c);
4) warning("Illegal character '%#c' ignored", c);
The first two are commonly found in many programs; the first
is annoying and the second is wrong.
The third is correct but no one bothers to do it.
The fourth is correct and no trouble to use.

Unfortunately we took out the change for fear that people would use it
and then have their programs break on other systems.

Perhaps %#s and %#c can be added in future updates to the language.

My big complaint with X3J11 regarding this, is that they changed the
definition of \x to take an arbitrary number of hex digits.  That
means that the output of any future %#s could be ambiguous.  Adding
a \z do-nothing escape would solve the problem though.

ok@quintus.UUCP (Richard A. O'Keefe) (05/31/88)

In article <19166@watmath.waterloo.edu>, rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
> Perhaps %#s and %#c can be added in future updates to the language.

In the mean-time, why not post sources for a
	spr_str(char *buffer, int width, int places, char *source)
function equivalent to
	sprintf(buffer, "%#*.*s", width, places, source)
so that we can try the idea out?  I suggest width < 0 means the equivalent
of "%#.*s" and places < 0 means the equivalent of "%#*s".
{spr_str(buf, width, 1, &c) can serve for sprintf(buf, "%#*c", width, c).}

When we have some prior art going, we can ask for it to be added to printf.

karl@haddock.ISC.COM (Karl Heuer) (06/01/88)

In article <1039@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <19166@watmath.waterloo.edu>, rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>> Perhaps %#s and %#c can be added in future updates to the language.
>
>In the mean-time, why not post sources for a
>	spr_str(char *buffer, int width, int places, char *source)

Since the result is probably going to be handed to printf anyway, I'd get rid
of the width/places arguments and let printf handle them.  This keeps the
function simpler.  (It would, however, probably be a good idea to add a size_t
argument representing the buffer size, to protect against overflow.  Why
repeat sprintf's mistake?)

>spr_str(buf, width, 1, &c) can serve for sprintf(buf, "%#*c", width, c).

Not if c is declared register and/or int (both of which are common when
dealing with text).  I think this case is sufficiently useful and simple that
it deserves to be a separate function: char *spr_chr(int c); the buffer itself
can be static.  (One might also include EOF in the domain.)

This still leaves some questions.  What should be the specifications for the
output of these functions?  In particular, on an ASCII system, should
spr_chr('\1') return "\\1", "\\001", "\\x01", "\\x1\\c", "\\^A", or "^A"?
Should spr_chr('\10') return "\\b"?  What about spr_chr('\\'), does it behave
like other printing chars and return "\\" (that's a string of length 1), or do
we want the output to be completely unambiguous, and return "\\\\" (that's two
backslashes)?

I find that in practice, the answers to the above depend on the application,
which is why I haven't yet installed this function in my private library.
(Also, I'm waiting to see if X3J11 resolves the problem of terminating hex
constants, as it may influence the design decision.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

ok@quintus.UUCP (Richard A. O'Keefe) (06/01/88)

In article <4311@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
> In article <1039@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
> >In article <19166@watmath.waterloo.edu>, rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
> >> Perhaps %#s and %#c can be added in future updates to the language.
> >
> >In the mean-time, why not post sources for a
> >	spr_str(char *buffer, int width, int places, char *source)
> 
> Since the result is probably going to be handed to printf anyway, I'd get rid
> of the width/places arguments and let printf handle them.  This keeps the
> function simpler.

It also makes it very little use.  Forget the width argument, which is
admittedly not so important.  The important one is the .places argument,
which serves to terminate the string.  If I want to print the first 20
characters of an array which might not have any NULs nearby, I can do
	printf("%.20s", buffer);
or	printf("%.*s", 20, buffer);
I generally use the second form, as then I can have an expression for the
amount that I want written.  This is especially useful for debugging, when
the bug may involve a clobbered NUL.  And this is precisely where I would
use %#s if it existed.  So it is important that a trial implementation of
this operation should let me bound the source this way.

I think we can describe the proposed effect of %#c quite succinctly:
it writes the shortest sequence <S> of isprint() characters such that
both '<S>' and "<S>" would be legal C constants for which the '<S>'
version would have the same value as its argument, preferring symbolic
forms such as \a to octal forms such as \7.  The effect of %#s would
be the effect of the appropriate sequence of %#c instances.
The effect of %#c on arguments like 'abc' would be implementation-defined.

peter@ficc.UUCP (Peter da Silva) (06/02/88)

Once upon a time I implemented a printf in Fortran to make it easier to
convert some 'C' stuff to ratfor. I included a format that is rather
handy, based on Fortran 'T' format:

	%t

What %t does is move around in the output. %10t means move to location
10. %-10t means move back 10 spaces. %+10t means move forward 10 spaces.
%*t means move to the location specified by the next argument.

For generating table output with complex elements it was wonderful:

	printf("%s.%s%32t%-10d%s\n", filename, ext, size, owner);
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: uunet!nuchat!sugar!peter.

karl@haddock.ISC.COM (Karl Heuer) (06/03/88)

In article <1043@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>In article <4311@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
>>Since the result is probably going to be handed to printf anyway, I'd get
>>rid of the width/places arguments and let printf handle them.  This keeps
>>the function simpler.
>
>It also makes it very little use [because the argument may not be terminated
>by a null character]

True.  I think I'd handle it with separate functions, perhaps called
strview(), strnview(), and memview().

>I think we can describe the proposed effect of %#c quite succinctly:
>... both '<S>' and "<S>" would be legal C constants ...

So '\'' and '"' would be escaped, in addition to '\\'?  I hadn't considered
that, but I suppose it makes sense (especially if the application is a
code generator, rather than a debugging routine).

>The effect of %#s would be the effect of the appropriate sequence of %#c
>instances.

Assuming that the C constant "<S>" should represent the argument to %#s, this
may not always be possible: as of the Jan88 dpANS, the only way to write the
two-character string "\x400""3" without using string-pasting is to write the
character '3' in octal or hex, but %#c would want to emit a '3'.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

mouse@mcgill-vision.UUCP (der Mouse) (06/12/88)

In article <4311@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
> In article <1039@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>> In article <19166@watmath.waterloo.edu>, rbutterworth@watmath.waterloo.edu (Ray Butterworth) writes:
>>> Perhaps %#s and %#c can be added in future updates to the language.
>> spr_str(buf, width, 1, &c) can serve for sprintf(buf, "%#*c", width, c).
> Not if c is declared register and/or int (both of which are common
> when dealing with text).  I think this case is sufficiently useful
> and simple that it deserves to be a separate function: char
> *spr_chr(int c); the buffer itself can be static.

If you are going to use a static buffer, folks, please use several of
them, or otherwise arrange that it doesn't lose big if I say

printf("  in_chr = %s, out_chr = %s\n",
	spr_chr(in_chr), spr_chr(out_chr));

With a domain as small as a char, it would even work to

static char *strings[] = { "EOF", "^@", "^A", ...., "\\377" };
/* or whatever strings you want */
/* this example assumes EOF is -1 */

char *spr_chr(int chr)
{
 return(((chr==EOF)||(chr==(int)(unsigned char)chr))?strings[chr+1]:"OOPS");
}

(by the way, is the chr==(int)(unsigned char)chr test a safe way of
testing whether the value is one an unsigned char could take on?)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

friedl@vsi.UUCP (Stephen J. Friedl) (06/13/88)

In article <1156@mcgill-vision.UUCP>, mouse@mcgill-vision.UUCP (der Mouse) writes:
> > [let's have spr_chr(int c) to return a stringized version of a char]
> 
> If you are going to use a static buffer, folks, please use several of
> them, or otherwise arrange that it doesn't lose big if I say
> 
> printf("  in_chr = %s, out_chr = %s\n",
> 	spr_chr(in_chr), spr_chr(out_chr));

For exactly this kind of thing we use a routine circbuf().  It has a
large static buffer and returns chunks to you upon request:

/*----------------------- circbuf.c ------------------------*/

#define		ASIZE		1024

char *
circbuf(size)
int	size;
{
static char	circarray[ASIZE],
		*nextfree = circarray;

	if ((nextfree + size) > &circarray[ASIZE])	/* enough room?	*/
		nextfree = circarray;			/* recycle	*/

	return((nextfree += size) - size));
}

/*----------------------- circbuf.c ------------------------*/

This is a handy malloc-like function that you don't have to free
up.  It strikes me as a little dangerous that you have to pay attention
to the lifetime of one of these strings (it will get overwritten
later), but we've not seen any problems with it.

-- 
Steve Friedl    V-Systems, Inc. (714) 545-6442      3B2-kind-of-guy
friedl@vsi.com     {backbones}!vsi.com!friedl    attmail!vsi!friedl

Nancy Reagan on ptr args with a prototype in scope: "Just say NULL"

karl@haddock.ISC.COM (Karl Heuer) (06/15/88)

In article <1156@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>In article <4311@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
>>[The routine to unctrl a single character can use a static buffer]
>
>If you are going to use a static buffer, folks, please use several of
>them, or otherwise arrange that it doesn't lose big if I [call the function
>twice in one expression]

If I could be certain that the domain would be small, I'd go with the lookup
table you suggest.  But we need to decide what properties are guaranteed to be
true, even on (say) an implementation with 16-bit chars.  (Remember, this
started out as something that was proposed for the ANSI standard.)

If we say that the return pointer is always valid, we are effectively
requiring the implementation to reserve 2*^CHAR_BITS of these strings in
memory.  This could be quite a bit of space; I don't think it's practical.

We could use the heap, but then the user would have to be responsible for
freeing it (which means he must save the pointer, which is an inconvenience of
roughly the same magnitude as providing his own buffer).  Also, this would
necessitate an error return, in case malloc fails.

Maybe the best way is to provide two char-viewing routines.  One would expect
the user to pass a buffer, and would be analogous to the string-viewing
routines.  The other would use a single static buffer, with the usual caveat.

What a mess.  Implementing it as a printf format would avoid all of this.

>(by the way, is the chr==(int)(unsigned char)chr test a safe way of
>testing whether the value is one an unsigned char could take on?)

Yes, but "(unsigned)chr <= UCHAR_MAX" might be better.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

bill@proxftl.UUCP (T. William Wells) (06/16/88)

In article <1043@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> I think we can describe the proposed effect of %#c quite succinctly:
> it writes the shortest sequence <S> of isprint() characters such that
> both '<S>' and "<S>" would be legal C constants for which the '<S>'
> version would have the same value as its argument, preferring symbolic
> forms such as \a to octal forms such as \7.

You do not want to use short octal escape sequences because the
string "\a0" could be output as "\70" which is ambiguous.  For
similar reasons, if X3J11 leaves in the unlimited length for
"\x", those can't be used.

karl@haddock.ISC.COM (Karl Heuer) (06/17/88)

In article <324@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>You do not want to use short octal escape sequences [for %#c or %#s encoding]
>because the string "\a0" could be output as "\70" which is ambiguous.  For
>similar reasons, if X3J11 leaves in the unlimited length for "\x", those
>can't be used.

Which means that there are some strings% that can't be output at all, except
by writing every single character -- including the printable ones -- in hex.
(The string-literal-pasting kludge doesn't help here.)

I'll make one more effort in the third public review to convince X3J11 that my
\c terminator is useful.  See comp.std.c for a discussion thereof.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
% Consider a machine with 12-bit bytes, printable chars being 0x400-0x4ff.