[comp.lang.misc] The simplest way I know to PrettyPrint

bpendlet@esunix.UUCP (02/19/87)

[]

The  trouble  with  most  pretty printers is that they try to get too
fancy. There is a fairly simple way to write a "programmable" pretty
printer. The trick is to use the keywords and punctuation symbols of
the language to trigger a set of actions in the pretty printer.

First you write a token scanner for the language you want to pretty
print. The scanner should return the actual string that was scanned
and  a  classification  code  every  time  it is called. You need the
string  because  you  will  need to write it out again, and you might
want to do something like convert it to upper or lower case. You need
the code to control an N-way branch. I would use a CASE statement,
but you might prefer a vector of pointers to procedures, or even a
computed goto, what the heck, any old N-way branch will do.

Next you decide on a set of operations that you want the pretty
printer  to  be  able  to  perform. Things like convert the string to
upper case, advance the output file to the next line, indent to the
current indenting level, increment the current indenting level,
decrement the current indenting level, push the current indenting
level on a stack, restore the indenting level from a stack, and write
the current token to the output file. As you play around with your
pretty printer you will think of several other operations.

So   once  you've  picked  the  operations  write  a  procedure  that
implements each operation. Now for each entry in your N-way branch,
that is, for each keyword, punctuation symbol, and whatever in the
language, write a sequence of calls to your operation procedures. For
instance the entry for "IF" might be something like.

	NextLine;
	Indent;
	WriteToken;
	IncrementIndent( 2 );

The entry for "END" ( I'm using Modula-2 for these examples ) might be

	NextLine;
	DecrementIndent( 2 );
	Indent;
	WriteToken;

The pretty printer is simple, easy to change, "programmable" ( change
an entry, recompile, and link ), and can be written over the average
weekend. Not to mention not needing to write a parser for the input
language,  not  needing to read, parse, ... a configuration file, and
it runs pretty fast.

I've used this technique and I've used pretty printers written by
other people that were based on this technique, it works better than
you might think.

		Bob Pendleton
 
P.S.

I don't know the origin of this technique. I learned it from an old
programmer at UNIVAC in 1979 or '80, he said he learned it from someone else,
several years earlier. 
-- 
---
               Bob Pendleton
               Evans & Sutherland Computer Corporation

UUCP Address:  {decvax,ucbvax,ihnp4,allegra}!decwrl!esunix!bpendlet
Alternate:     {ihnp4,seismo}!utah-cs!utah-gr!uplherc!esunix!bpendlet

hacking code, hacking code.
sometimes happy, sometimes bored, 
almost lost in a pile of spode.
tell the people "I'm only hacking code."

---
I am solely responsible for what I say.
---

msb@sq.UUCP (02/24/87)

Along about now it should be pointed out, again, that there's no way for any
prettyprinter to distinguish in general between:

	if (cond1)			if (cond1)
		normalaction1			normalaction
	else if (cond2)			else
		normalaction2			if (cond2)
	else						erroraction1
		normalaction3			else
							erroraction2
Mark Brader

mouse@mcgill-vision.UUCP (02/28/87)

In article <366@esunix.UUCP>, bpendlet@esunix.UUCP (Bob Pendleton) writes:
> The  trouble  with  most  pretty printers is that they try to get too
> fancy. There is a fairly simple way to write a "programmable" pretty
> printer. The trick is to use the keywords and punctuation symbols of
> the language to trigger a set of actions in the pretty printer.

> [basically, scan tokens and do things as you get them]

> For instance the entry for "IF" might be something like.
> 	NextLine;
> 	Indent;
> 	WriteToken;
> 	IncrementIndent( 2 );

Except that this technique loses comments and formatting in the input.
Doing the right thing with comments can be difficult.  Doing the right
thing when breaking expressions over multiple lines can be very
difficult and in some cases impossible (eg, "len1+len2 + extra+1" versus
"len1 + len2+extra + 1" - how do you tell which + signs should have
spaces around them?  It depends on how they group conceptually.).

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!musocs!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!musocs!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu