[comp.unix.questions] Guessing buffer size needed for sprintf beforehand

jad@insyte.uucp (Jill Diewald) (04/28/88)

Hi-

I'm having a problem with sprintf because it requires my knowing
how big a buffer to supply.  The IO module of our program needs
to sprintf into its internal output buffer - which is dynamically
allocated.  By the time the the call to the IO module happens, 
all the IO function knows is what it is passed as arguments. 
(It is a very well structured modular system).  The IO call looks 
like a vprintf and has the same arguments.  The IO routine does 
not know how many characters there are to print until sprintf 
returns that number, by then it it too late!

The original approach was to set a fudge factor of 256.  Unfortunately
this is not good enough.  We ran into this problem by a user doing 
a particularly strange thing ...but you know how end users are...
Seriously though, the system is used to produce very large reports so 
we need a fool proof solution.

I can see some solutions but none of them are particularly nice.
Two constraints, the solution has to be FAST and portable since
the system is run on many computers including VMS.

One solution is to fprintf into a file opened to be /dev/null.
This requires two calls to the c print fuctions, one to get the
size and the second to actually print.  Since the first call
goes to dev/null it should be faster since it doesn't really
write anything? (Is this true?  For all UNIXs?).  This won't
work for VMS though.

Another solution is to parse the printf format and arguments and
create my own count.  This would be non-trivial since there are
so many different formats and sort of like re-inventing the wheel
since printf already does this.

Is there some function out there that does the parsing and returns
a count?

Is there another approach?  Two calls to a print function for every
line is probably too slow.

Any ideas would be appreciated

	Jill Diewald
	Innovative Systems
	Newton, MA 
	(617) 965-8450
	...harvard!axiom!insyte!jad

chris@mimsy.UUCP (Chris Torek) (05/03/88)

In article <136@insyte.uucp> jad@insyte.uucp (Jill Diewald) writes:
>... sprintf ... requires my knowing how big a buffer to supply.
>The IO module of our program needs to sprintf into its internal
>output buffer - which is dynamically allocated. ... The IO routine
>does not know how many characters there are to print until sprintf 
>returns that number, by then it it too late!

Welcome to the `we hate sprintf' club :-) .  Alas, there is no portable
way to do this.  There *is* one reasonably portable solution, but it
may be slow.  If you have a `function oriented stdio', there is a
convenient and fast solution, but it is not portable.

>One solution is to fprintf into a file opened to be /dev/null.
>This requires two calls to the c print fuctions, one to get the
>size and the second to actually print.

By the time you reach a routine that takes a `va_list' argument,
it is too late to scan the argument list twice: there is no way
to `back up' the va_list parameter.  So, while this would get
you a suitable count, it would then be too late to use that count.

>Is there another approach?

Here is one:

static FILE *sfile;		/* dynamic sprintf string file */

/*
 * open a temp file in "w+" mode, or perhaps even
 * in "w+b" mode; abort if it cannot be opened.
 */
void
dynopen()
{
	/* you supply the details */
}

/*
 * dynamically create a string holding the result from a printf.
 */
char *
dynsprintf(fmt, ap)
	char *fmt;
	va_list ap;
{
	static FILE *sfile;
	int count, c;
	char *ret, *cp;

	if (sfile == NULL)
		dynopen();

	/* write the arguments to the temp file, and count characters */
	rewind(sfile);
	count = vfprintf(sfile, fmt, ap);
	if (count == EOF) ... handle error

	/* make room */
	ret = malloc((unsigned)count + 1);
	if (ret == NULL) ... do something

	/* read back the result */
	rewind(sfile);
	cp = ret;
	while (--count >= 0) {
		if ((c = getc(sfile)) == EOF) ... ??? why?
		*cp++ = c;
	}
	*cp = 0;
	return (ret);
}

void
dynclose()
{

	if (sfile != NULL) {
		(void) fclose(sfile);
		... remove temp file
	}
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

andrew@comp.vuw.ac.nz (Andrew Vignaux) (05/09/88)

In article <11331@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>By the time you reach a routine that takes a `va_list' argument,
>it is too late to scan the argument list twice: there is no way
>to `back up' the va_list parameter.  So, while this would get
>you a suitable count, it would then be too late to use that count.

The `varargs' manual page says "Multiple traversals, each bracketed by
va_start ...  va_end, are possible." so it is possible to rescan the
argument list.  (or have I misinterpreted what you said?)

I can't see any nice/portable/efficient solution to the original problem.
- Push the fudge factor up -- the classic unix solution.
- Chris Torek's solution.
- Write a function to interpret the printf format.  Release it to the
  world.  Convince everyone who writes "yet another stdio" to include
  your function.  Convince the various standards committees to accept
  the function.  Then in 5-10 years you will have a standard solution :-)
- send the output off to a pipe, and wait until it comes back.  Then
  if the pipe buffer is too small you can blame the operating system :-)
- open a FILE* like Chris.  setbuf() the buffer to point out of your
  address space.  Set up a signal handler to trap SIGSEGV.  Then count
  the number of segmentation violations :-) :-)

I suggest you push up the fudge factor and turn it into a SEP.

BTW, I think there should be a stropen()-like function so mere-mortals
can open string based stdio streams.  Is this in the standard?  Comments?

Andrew
------------------------------------------------------------------------------
Domain address: andrew@comp.vuw.ac.nz   Path address: ...!uunet!vuwcomp!andrew

chris@mimsy.UUCP (Chris Torek) (05/11/88)

>In article <11331@mimsy.UUCP> I wrote:
>>By the time you reach a routine that takes a `va_list' argument,
>>it is too late to scan the argument list twice....

In article <13597@comp.vuw.ac.nz> andrew@comp.vuw.ac.nz (Andrew Vignaux) writes:
>The `varargs' manual page says "Multiple traversals, each bracketed by
>va_start ...  va_end, are possible." so it is possible to rescan the
>argument list.  (or have I misinterpreted what you said?)

You have.  Watch:

	int prf(const char *fmt, ...) {
		va_list ap;
		int rv;
		va_start(ap, fmt);
		rv = __printf(stdout, fmt, ap);
		va_end(ap);
		return (rv);
	}

	int __printf(FILE *fp, const char *fmt, va_list ap) {
		...

Which function takes a va_list argument?  Which one has the
va_start/va_end pair?  You *could* do this:

	int prf(const char *fmt, ...) {
		...
		va_start(ap, fmt);
		_first_fn(fmt, ap);
		va_end(ap);
		va_start(ap, fmt);
		_second_fn(fmt, ap);
		va_end(ap);
		...
	}

except that the original problem requires that the solution be
contained within the __printf function.

>BTW, I think there should be a stropen()-like function so mere-mortals
>can open string based stdio streams.  Is this in the standard?

It is not in the standard, but I have one; mine is called fmemopen().
There is a more general interface called funopen():

	/* declaration (prototype version) */
	FILE *funopen(const void *cookie,
		int (*readfn)(void *cookie, char *buf, int n),
		int (*writefn)(void *cookie, const char *buf, int n),
		long (*seekfn)(void *cookie, long off, int whence),
		int (*closefn)(void *cookie));
	#define	fropen(cookie, fn) funopen(cookie, fn, 0, 0, 0)
	#define fwopen(cookie, fn) funopen(cookie, 0, fn, 0, 0)

Unix stdio needs only these four operations.  By restructuring the
stdio internals slightly, `normal' file I/O is done with read, write,
seek, and close functions that just call read, write, seek, and close,
and anyone can easily write `special' I/O routines.  fmemopen() is just
a special instance of this same general case---in principle, it uses
read and write functions that transfer to user memory, although in
practise just aims the internal stdio buffers directly at that memory;
its read and write functions return an error, and get called only when
the declared region is full (write) or empty (read).  (Being an
internal stdio function, it is allowed to cheat like this.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

allbery@ncoast.UUCP (Brandon Allbery) (05/12/88)

As quoted from <136@insyte.uucp> by jad@insyte.uucp (Jill Diewald):
+---------------
| One solution is to fprintf into a file opened to be /dev/null.
| This requires two calls to the c print fuctions, one to get the
| size and the second to actually print.  Since the first call
| goes to dev/null it should be faster since it doesn't really
| write anything? (Is this true?  For all UNIXs?).  This won't
| work for VMS though.
+---------------

Close.  fprintf() to a newly-created temporary file, then rewind() it (this
is a stdio function) and getc() out of it and send it wherever.  If you need
the size, get the file size using the OS's current-file-position call before
the rewind().  This is perhaps a bit slower than an in-core buffer, but it's
clean and is limited only by free space on the disk!

When I want to do this, I declare a static 32K buffer; on the other hand, I
rarely print a string of over 200 characters anyway.  If I'm doing it to
a file or the terminal as the ultimate destination and I don't have to munge
the output, I just v?printf() to the destination directly and skip the buffer
entirely.

++Brandon
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
Delphi: ALLBERY						     MCI Mail: BALLBERY

papowell@attila.uucp (Patrick Powell) (05/14/88)

In article <7768@ncoast.UUCP> allbery@ncoast.UUCP (Brandon Allbery) writes:
>As quoted from <136@insyte.uucp> by jad@insyte.uucp (Jill Diewald):
>+---------------
>| One solution is to fprintf into a file opened to be /dev/null.
>| This requires two calls to the c print fuctions, one to get the
>| size and the second to actually print.  Since the first call
>| goes to dev/null it should be faster since it doesn't really
>| write anything? (Is this true?  For all UNIXs?).  This won't
>| work for VMS though.
>+---------------

(Brandon talks about v?printf).
>++Brandon
>-- 
>	      Brandon S. Allbery, moderator of comp.sources.misc
>	{well!hoptoad,uunet!marque,cbosgd,sun!mandrill}!ncoast!allbery
>Delphi: ALLBERY						     MCI Mail: BALLBERY

I suggest using the handy dandy little function,
snprintf( int count, char *buffer, <varargs stuff about format, etc.> )

This was proposed as a part of the "Standard C Library".  It was reject
for various reasons that I am not privy to.
I have reached the stage where I have re-implemented a portable version of
this that I use wherever I must do SPRINTF.

The lack of range and bound checking versions of 'standard'
routines in the library routines has been an obstacle to producing
portable and bombproof code.

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

hitz@mips.COM (David Hitz) (05/16/88)

In article <7768@ncoast.UUCP> allbery@ncoast.UUCP (Brandon Allbery) writes:
>As quoted from <136@insyte.uucp> by jad@insyte.uucp (Jill Diewald):
>+---------------
>| One solution is to fprintf into a file opened to be /dev/null.
[...]
>Close.  fprintf() to a newly-created temporary file, then rewind() it (this
>is a stdio function) and getc() out of it and send it wherever.  If you need
>the size, get the file size using the OS's current-file-position call before
>the rewind().  This is perhaps a bit slower than an in-core buffer, but it's
>clean and is limited only by free space on the disk!

Oops: SYSV vs. BSD.

In SYSV printf() returns the number of characters printed, so Jill's
solution works fine without Brandon's gyrations.

In BSD printf() returns the first argument.
-- 
Dave Hitz					home: 408-739-7116
UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!hitz 	play: 408-991-0345

chris@mimsy.UUCP (Chris Torek) (05/17/88)

In article <2190@quacky.mips.COM> hitz@mips.COM (David Hitz) writes:
>In SYSV printf() returns the number of characters printed, so Jill's
>solution works fine without Brandon's gyrations.
>In BSD printf() returns the first argument.

Close.  In SysV, printf, sprintf, fprintf, vprintf, vsprintf, and
vfprintf all return the number of characters transferred.  In 4.2 and
4.3BSD, printf and fprintf return 0 for success, -1 for error, sprintf
returns its first argument, and there are no v*printf routines.  The
4BSD lint library pretends that printf and fprintf have no return
value.  In 4.3-tahoe, printf, sprintf, and fprintf return the number
of characters transferred, and unless I convince someone quickly,
v*printf are still missing.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

andrew@comp.vuw.ac.nz (Andrew Vignaux) (05/18/88)

Sorry to (a) get off the subject, and (b) turn it into a tutorial for people
who can't keep up with the standards, but:

In article <11439@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>In article <11331@mimsy.UUCP> I wrote:
>>>By the time you reach a routine that takes a `va_list' argument,
>>>it is too late to scan the argument list twice....
[...]
>Watch:
>	int prf(const char *fmt, ...) {
[...]
>		va_start(ap, fmt);
>		rv = __printf(stdout, fmt, ap);
>		va_end(ap);
[...]
>	int __printf(FILE *fp, const char *fmt, va_list ap) {
>
>Which function takes a va_list argument?  Which one has the
>va_start/va_end pair?

Of course.  How silly of me.  Because I very seldom send a va_list off to
another routine (working on a Pyramid one is not quite sure about what one can
get away with, until one trys :-) I misread the va_list as a va_alist.
However, I have a few questions about varargs:
   ("CAN" means with respect to the standards/manual page)

[] CAN you pass va_list's around like this?  I realise v[fs]?printf
   get away with it, but their implementations are not required to be
   portable.
   (this does work on vax/sun/pyramid)

[] Given that you are allowed to pass va_list's around why CAN't you
   write:
		va_start (ap);
		count = count_printf (fmt, ap);
		rv = vsprintf (new_space, fmt, ap);
		va_end (ap);
   (this does not work on pyramids)

   or equivalently:
		save_ap = ap;
		i = va_arg (ap, int);
		j = va_arg (save_ap, int);
   (this does not work on pyramids)

   or even:
		save_ap = ap;
		rv = vprintf (fmt, ap);
		/* now get the parameter after the printf args */
		i = va_arg (ap, int);
   (this ONLY works on pyramids-re comment)

[] CAN you strip a few arguments off the va_list and then pass it to other
   routines?
		i = va_arg (ap, int);
		s = va_arg (ap, char *);
		switch (i) {
		    case VPRINT_IT:
			fmt = va_arg (ap, char *);
			(void) vprintf (fmt, ap);
			break;
		}
   (this works on vax/sun/pyramid)

I understand why these work/don't work in the current vararg implementations
for each machine (the pyramid stores extra va_arg state information, the
others just move a pointer).  However, I can't see anything in varargs(3) that
allows/forbids doing these things, except perhaps the "multiple traversals"
comment.  Presumably there is a more precise definition of the varargs stuff
somewhere.

Is there any need for a va_rewind ()?

[Aside: does va_start have 2 parameters in the new standard? -- boy am I out
of date :-(]

Anyway to get back to what I am really interested in.

In article <11439@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) continues:
>In article <13597@comp.vuw.ac.nz> I wrote:
>>BTW, I think there should be a stropen()-like function so mere-mortals
>>can open string based stdio streams.  Is this in the standard?
>
>It is not in the standard, but I have one; mine is called fmemopen().
>There is a more general interface called funopen():
[I have non-prototyped the declaration to save space-AJV]
>	FILE *funopen(cookie, readfn, writefn, seekfn, closefn)
>	void * cookie;

Looks really nice and object oriented--just what I wanted.  I guessed you
meant something like this when you referred to a `function oriented stdio'.
How {,un}standard is it?  How can I get it?

I can't quite pick up the semantics of funopen() from the declaration.  My
guess is that the f{whatever}open function performs the appropriate open,
packages whatever info the virtual functions will need into a cookie record,
and then returns the result of funopen()--or am I completely wrong again :-(.
Otherwise, what is the initial cookie parameter?  Where/how do you describe
the `open' call {open()--re:fopen, no-op()--re:fmemopen, fork();pipe()--
re:popen, ...}?  Is there a funreopen() (for those cases where you want to
change functions in mid-stream :-)? etc...

Should I be looking in a standards document for this information? (a little
difficult when you're on the edge of the world :-(

>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
>Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

Andrew
------------------------------------------------------------------------------
Domain address: andrew@comp.vuw.ac.nz   Path address: ...!uunet!vuwcomp!andrew

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/19/88)

In article <13621@comp.vuw.ac.nz> andrew@comp.vuw.ac.nz (Andrew Vignaux) writes:
>[] CAN you pass va_list's around like this?

Sure.  The only thing you have to be aware of is that va_list may be
an array or some other data type (void * or struct most likely), so you
can't be sure whether it will be passed by reference or not.  Therefore,
after the called function returns, further use of the va_list before
va_end is non-portable.  You could avoid this problem by passing the
ADDRESS of the va_list instead, but older compilers don't understand
& applied to an array name.

>[] Given that you are allowed to pass va_list's around why CAN't you
>   write:
>		va_start (ap);
>		count = count_printf (fmt, ap);
>		rv = vsprintf (new_space, fmt, ap);
>		va_end (ap);
>   (this does not work on pyramids)

Non-portable; see my initial comment.

>		save_ap = ap;
>		i = va_arg (ap, int);
>		j = va_arg (save_ap, int);
>   (this does not work on pyramids)

Non-portable; use memcpy to make the copy.  You should actually use
separate va_start/va_end on each copy of the va_list, to be safe.

>		save_ap = ap;
>		rv = vprintf (fmt, ap);
>		/* now get the parameter after the printf args */
>		i = va_arg (ap, int);
>   (this ONLY works on pyramids-re comment)

Non-portable; see my initial comment.

>[] CAN you strip a few arguments off the va_list and then pass it to other
>   routines?

Sure.

>Presumably there is a more precise definition of the varargs stuff
>somewhere.

There sure is: the forthcoming ANSI/ISO C standard.  We trashed <varargs.h>
in favor of <stdarg.h> as part of allowing a different linkage method to
be used for variadic functions than for normal functions.  Old varargs was
never precisely enough defined, as you have discovered.

>Is there any need for a va_rewind ()?

No.

>[Aside: does va_start have 2 parameters in the new standard? -- boy am I out
>of date :-(]

Yes, one of the parameters provides an "anchor point" in the parameter
list, as required for some implementation methods.  Therefore, variadic
functions must have at least one argument.