[comp.lang.c] parsing the format string at compile time...

jobrien@nixbur.UUCP (John O'Brien) (10/11/89)

I'm looked a description of Modula-2 yesterday, and one of the things
that struck me was its lack of a general purpose I/O function.  Instead,
you have a collection of procedures which output one value of each type.
Thus, where you would say in C:

printf("This is an integer: %d\n", 6);

you would need three procedure calls in Mod-2:

writechar("This is an integer: ");
writeint(6);
writeln;

which is a real pain in the neck for writing report programs, but which
can be very fast, because the Modula-2 programmer is parsing the format
string manually, where the format string is parsed at run-time in the C
program.  For a more complicated example, the savings in time might be
pretty significant.  It seems to me that if the format string in the 
printf call is a constant (which it is most of the time), the compiler
should be able to parse the string at compile time, and turn the printf
call into something like the series of Modula-2 calls, with a correspond-
ing increase in efficiency.  Do C compilers do this?  Are there any prob-
lems with doing this?

				Enquiring Minds Want to Know!

#include "std_disclaimer.h"

"I Saw Elvis in a 386 Circuit Diagram!" -- coming soon to a newsstand near you!

drcook@hubcap.clemson.edu (david richard cook) (10/11/89)

From article <705@nixbur.UUCP>, by jobrien@nixbur.UUCP (John O'Brien):
> printf call is a constant (which it is most of the time), the compiler
> should be able to parse the string at compile time, and turn the printf
> call into something like the series of Modula-2 calls, with a correspond-
> ing increase in efficiency.  Do C compilers do this?  Are there any prob-
> lems with doing this?
> 
> 				Enquiring Minds Want to Know!
> 

	If C did allow this, it would not be C.  The compiler knows
nothing about any functions, including I/O.  Some new compilers may be
able to this, though I do not know of any, by using the #pragma
preprocesser directive to declare certain functions as builtin.  The
use of #pragma is left up to the compiler implementation and not C.
If an ANSI C implementation does not understand what the #pragma
directive is trying to accomplish, it will simply ignore it.

lmiller@venera.isi.edu (Larry Miller) (10/11/89)

In article <705@nixbur.UUCP> jobrien@nixbur.UUCP (John O'Brien) writes:
>I'm looked a description of Modula-2 yesterday, and one of the things
>that struck me was its lack of a general purpose I/O function.  Instead,
>you have a collection of procedures which output one value of each type.
>Thus, where you would say in C:
>
>printf("This is an integer: %d\n", 6);
>
>you would need three procedure calls in Mod-2:
>
>writechar("This is an integer: ");
>writeint(6);
>writeln;
>
	Parsing the format string is so incredibly trivial
	that there can be no advantage of having to make
	separate function calls for each data type like this.
	Many implementations just code it in assembly
	language, giving even faster execution, but even coded
	in C directly, it's no more than a loop to look at
	each character in the format string, then switching on %.

Larry Miller				lmiller@venera.isi.edu (no uucp)
USC/ISI					213-822-1511
4676 Admiralty Way
Marina del Rey, CA. 90292

henry@utzoo.uucp (Henry Spencer) (10/12/89)

In article <6737@hubcap.clemson.edu> drcook@hubcap.clemson.edu (david richard cook) writes:
>> printf call is a constant (which it is most of the time), the compiler
>> should be able to parse the string at compile time, and turn the printf
>> call into something like the series of Modula-2 calls...
>
>	If C did allow this, it would not be C.  The compiler knows
>nothing about any functions, including I/O...

This may be true of your compiler, but it is far from universal.  The
C library is part of the language.  It's been that way informally for
a long time (as witness the flak Whitesmiths took over their attempts
to rationalize the library) and is now official (ANSI C).  Such an
optimization is perfectly legitimate.  And no, it doesn't require a
#pragma to enable it.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/12/89)

In article <705@nixbur.UUCP> jobrien@nixbur.UUCP (John O'Brien) writes:
-It seems to me that if the format string in the 
-printf call is a constant (which it is most of the time), the compiler
-should be able to parse the string at compile time, and turn the printf
-call into something like the series of Modula-2 calls, with a correspond-
-ing increase in efficiency.  Do C compilers do this?  Are there any prob-
-lems with doing this?

The proposed C Standard permits this in a conforming hosted implementation.
I don't know of any existing implementations that do this, although some do
optimize other standard library functions into in-line code.

chris@mimsy.UUCP (Chris Torek) (10/14/89)

In article <10082@venera.isi.edu> lmiller@venera.isi.edu (Larry Miller) writes:
>Parsing the format string is so incredibly trivial
>that there can be no advantage of having to make
>separate function calls for each data type like this.

There is certainly no advantage to the programmer.  The execution time,
however, can be quite significant.  One of the big speedups to the 4BSD
VAX PCC-based C compiler was to change most of the printf() calls to
fputs() calls (with fputs() being in VAX assembly, using special string
instructions).  The old _doprnt was in assembly as well, but was still
significantly slower.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

Tim_CDC_Roberts@cup.portal.com (10/14/89)

Regarding Modula-2 requiring the programmer to parse format strings:

C++ is very much like this, if you think about it.

   printf ("%d %s %d\n", 123, "What?", 456");

becomes:

   cout << 123 << ' ' << "What?" << ' ' << 456;

There are separate, overloaded definitions of the '<<' operator, each of
which knows how to format a particular type.

Ada kind of mixes the two.  You make a separate function call for each
item (like Modula-2), but they all have the same function name (thanks
to overloading).

  put (123);
  put (" ");
  put ("What?");  -- ...

Tim_CDC_Roberts@cup.portal.com                | Control Data...
...!sun!portal!cup.portal.com!tim_cdc_roberts |   ...or it will control you.

jkl@csli.Stanford.EDU (John Kallen) (10/15/89)

In article <23054@cup.portal.com> Tim_CDC_Roberts@cup.portal.com writes:
!Regarding Modula-2 requiring the programmer to parse format strings:
!
!C++ is very much like this, if you think about it.
!
!   printf ("%d %s %d\n", 123, "What?", 456");
!
!becomes:
!
!   cout << 123 << ' ' << "What?" << ' ' << 456;
!
!There are separate, overloaded definitions of the '<<' operator, each of
!which knows how to format a particular type.

I don't want to be picky, but at least as far as Cfront 2.0, G++ 1.36.0 and
Zortech C++ go, your C++ line would output:

12332What?32456

You probably want:

cout << 123 << " " << "What?" << " " << 456;

John.
_______________________________________________________________________________
 | |   |   |    |\ | |   /|\ | John K{llen       "If she weighs the same as a
 | |\ \|/ \|  * |/ | |/|  |  | PoBox 11215         a duck...she's made of wood"
 | |\ /|\  |\ * |\ |   |  |  | Stanford CA 94309   "And therefore?" "A WITCH!"
_|_|___|___|____|_\|___|__|__|_jkl@csli.stanford.edu___________________________

meissner@tiktok.dg.com (Michael Meissner) (10/16/89)

In article <6737@hubcap.clemson.edu> drcook@hubcap.clemson.edu (david richard cook) writes:
| From article <705@nixbur.UUCP>, by jobrien@nixbur.UUCP (John O'Brien):
| > printf call is a constant (which it is most of the time), the compiler
| > should be able to parse the string at compile time, and turn the printf
| > call into something like the series of Modula-2 calls, with a correspond-
| > ing increase in efficiency.  Do C compilers do this?  Are there any prob-
| > lems with doing this?
|
| 	If C did allow this, it would not be C.  The compiler knows
| nothing about any functions, including I/O.

Wrong.  The ANSI standard gives explicit license for implementations
to 'know' about any of the standard functions in section 4.  Nothing
requires that a user call to 'printf' call an actual routine printf.
All that is required, besides getting the corect result, is that an
implementation not evaluate arguments more than once (except for
grandfathering putc/getc/putchar/getchar, and possibly a few others
that have historically been macros), and that taking the address of
the function work.  Granted I don't know about any compilers that
currently do this with printf, but there are compilers which will do
special things for certain library routines (typically math or
memory/string copies and compares).  For example, the Data General MV
C compiler has quite a few routines that are builtin, such as memcpy,
strcpy, memcmp, strcmp, abs, fabs, sin, etc.  Also, the GNU C compiler
has support for builtin functions, though at present there aren't that
many (abs, labs, fabs, alloca, etc.).

|					       Some new compilers may be
| able to this, though I do not know of any, by using the #pragma
| preprocesser directive to declare certain functions as builtin.  The
| use of #pragma is left up to the compiler implementation and not C.
| If an ANSI C implementation does not understand what the #pragma
| directive is trying to accomplish, it will simply ignore it.

Even though I originally voted for pragma in the ANSI committee, I
feel that the current semantics are a botch....

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?

bill@twwells.com (T. William Wells) (10/16/89)

In article <10082@venera.isi.edu>, lmiller@venera.isi.edu (Larry Miller) writes:
:         Parsing the format string is so incredibly trivial
:         that there can be no advantage of having to make
:         separate function calls for each data type like this.

How about decreasing the running time of a program to 40% of its
original time? This was the improvement I got for one program, on a
VAX under IS/3, when I replaced the printf with my own routine. How
about to 30%? This was the result of yesterday's recoding on my
Microport SysV/386 3.0e. Both cases used nothing more difficult than
%d and %s.

Printf does two things: interprets the format string, a not entirely
trivial task. And: formats the results USING GENERAL PURPOSE ROUTINES.
That latter is critical. Formatting with %d can take about ten times
as long as doing it yourself.

A smart printf could special case these and use a separate routine
for them. This could make an amazing difference in the performance of
printf.

Whether doing this at compile time instead of at run time is a good
idea I don't know. Before I'd say anything on that, I'd want to
experiment.

However, there is one good reason for parsing the string at compile
time: it becomes possible to type check the arguments against the
string. Even if no special code were generated, this would be very
valuable, eliminating many bugs and portability problems. Doing this
is legal under ANSI C.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/16/89)

In article <1856@xyzzy.UUCP> meissner@tiktok.UUCP (Michael Meissner) writes:
>The ANSI standard gives explicit license for implementations
>to 'know' about any of the standard functions in section 4.

Just to make sure nobody misunderstands, that's true for HOSTED implementations
only.  Standalone implementations do NOT know about the Section 4 functions.

Hosted implementations are the ones that most programmers will care about.
Standalone implementations are of interest for imbedded system and operating
system programmers (to take the most obvious examples).

dhesi@sun505.UUCP (Rahul Dhesi) (10/18/89)

>|      If C did allow this, it would not be C.  The compiler knows
>| nothing about any functions, including I/O.
>
>Wrong.  The ANSI standard...

Another typical Usenet example of crosstalk.  Person A, thinking of
traditional C, says something that is true in that context.  Person B
(posting from a place where nothing happens), assuming C means ANSI C,
immediately contradicts.

Rephrasing:

     If traditional C did allow this, it would not be traditional C.
     The traditional C compiler knows nothing about any functions,
     including I/O.

     Right.  However, the ANSI standard...

Now wasn't that better?

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
Do not use From: address for reply if it contains "sun".

henry@utzoo.uucp (Henry Spencer) (10/21/89)

In article <990@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>Rephrasing:
>
>     If traditional C did allow this, it would not be traditional C.
>     The traditional C compiler knows nothing about any functions,
>     including I/O.

More accurate rephrasing:

	If traditional C did allow this, it would not be what I think
	of as traditional C, therefore I'm sure it wasn't allowed, even 
	though the documentation actually was silent on the matter.
	The traditional C compiler I use knows nothing about any functions,
	therefore no such compiler does.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

karl@haddock.ima.isc.com (Karl Heuer) (10/21/89)

In article <990@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
>[Rephrasing the previous conversation in an attempt to make it correct:]
>     If traditional C did allow this, it would not be traditional C.
>     The traditional C compiler knows nothing about any functions,
>     including I/O.
>
>     Right.  However, the ANSI standard...

No, the correct rephrase is:

     Pcc on a VAX happens not to do this.  Therefore, if your compiler does
     do this, it isn't pcc on a VAX.

     Right.  But it's still a valid C implementation, using either the ANSI
     Standard or pre-ANSI common-law as a touchstone.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

cpcahil@virtech.UUCP (Conor P. Cahill) (10/21/89)

In article <990@cirrusl.UUCP>, dhesi@sun505.UUCP (Rahul Dhesi) writes:
> Another typical Usenet example of crosstalk.  Person A, thinking of
> traditional C, says something that is true in that context.  Person B
> (posting from a place where nothing happens), assuming C means ANSI C,
> immediately contradicts.
> 
> Rephrasing:
> 
>      If traditional C did allow this, it would not be traditional C.
>      The traditional C compiler knows nothing about any functions,
>      including I/O.

People may have assumed that this is true in "traditional" C compilers, but
I have run accross several pre-ansi C compilers (like 4 years ago) that
performed inline substitution of functions like strcpy under some
specific circumstances.  At the time these compilers were considered very
good (performance & code wise).

BTW - The only reason I found out about it at the time was that they did not 
correctly emulate the strcpy (in that they did not return a pointer to 
string 1) and in debugging the results we found the inline substitution.

There was no flag in the compiler to turn this off, so we ended up making
lots of changes to work around the bug.


-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+