[comp.software-eng] Flexible Prettyprinters, an example...

winter@hpldola.HP.COM (Kirt Winter) (05/24/88)

In an earlier discussion about style rules for C shops, I posted a response
in which I talked about a flexible, intelligent prettyprinter which I designed
in pursuit of my MS.

Judging from the responses I've gotten, there is sufficient interest to warrent
a full posting to the net.

At the current time, the fate of my work on further versions is in limbo.  Some
entities have expressed interest in the prettyprinter, but nothing has really
piqued my interest enough to get me excited about parting with it.  I may yet
give the source code (or just the executable) to individuals for personal use.

The challenge was to design a prettyprinter that could be easily adapted to a
wide variety of programming styles.  To demonstrate the feasability of the
concept, I created the prototype for Turbo Pascal 3.0 source code.

First, I'll talk about what prettyprinters actually do.

A prettyprinter is intended to take syntactically legal (usually) source code
and, in some manner, transform the "style" in which it is written.  This could
range from highlighting keywords to rearranging even the order of procedures
and functions.  A happy medium is usually chosen by the designer of the pretty-
printer, usually altering the spacing of the program.

The usually stated goal of a prettyprinter is to "improve readability" in some
manner.  Traditional prettyprinter designers have attempted to chose a style
that they believe will accomplish that goal.

Unfortunately, the response to a traditional prettyprinter is... "I don't like
where it puts the <token1> relative to <token2>."  Or something along that 
line.

In my prototype, I focused on the relative placement of language tokens.  That
doesn't mean that other style facets (capitalization of keywords, identifier
underscore vs. capitals, etc.) couldn't be handled, but rather that my proto-
type doesn't handle them in an effort to concentrate on "indentation and
placement" issues.

A prettyprinter, whether flexible or not, recognizes "white-spaces" and then
replaces the original white-spaces in the source code with ones favored by
the prettypriner's designer (traditionally).  The trick to adding flexibility
is to allow a user to specify the white-spaces that will be used to replace
the ones in the source.

So, one could envision a prettyprinter in which a seperate file of white-
spaces would be saved.  A user could edit that file, and therefore change
the way the prettyprinter worked.

This is certainly possible, but in looking at the wide range of styles, and
all the places that white-space could be recognized, I felt that while it
would still be useful to do it that way, it would also be quite cumbersome to
the user.

As I stated earlier, a prettyprinter must recognize white-spaces in order to
replace them (even if recognizing means skipping).  Why not use the same
method, but instead of replacing the white-spaces, analyze and save them?
This is the way IPP (my prototype Intelligent PrettyPrinter) is configured.

Specifying a new style to IPP is accomplished by typing...

ipp L <style.pas

The file "style.pas" can either be a program previously written by the user
(subject to some possible problems, more on that later) or a sample program
that is provided.  The results are written to a configuration file.  All
subsequent programs passed through IPP will be formatted in the style that
was reflected by "style.pas".

A user written program used for "learning" has a couple of possible problems
associated with it.  First, it might not include all language constructions.
If a CASE statement isn't included, IPP won't know how to format it.  There
are other possible problems with comments, particularly multiple comments
between keywords or constructions, such as...

...
        begin            {here is a comment}  {another}
{a third}   statement;

In cases like this it is difficult for a 1-pass prettyprinter to determine
which instance of white-space is the correct one to either learn or subst-
itute for.  IPP uses some simple heuristics in this case, but it is not yet
foolproof.

IPP's method of formatting can be illustrated with BNF formalism like this...
(well, pseudo BNF)

if statement :: IF <WS1> condition <WS2> THEN <WS3> compound
              | IF <WS1> condition <WS2> THEN <WS4> statement ;

One simply choses the areas in the language that one wants to "learn", and
inserts <WSx> tokens accordingly.

IPP currently uses 48 of these tokens, and these are sufficient to capture a
very wide range of syntactic style.  IPP does not format anything below the
statement level currently, although this could be added fairly easily.  The
only problem with this (from a learning standpoint, this is another problem
with using a "user-defined" style sample) is that this is the area of 
syntactic style that programmers are least likely to be consistent in (at
least based on my work).

Flexible prettyprinters are avalable for some languages.  One exists for
LISP, but uses "deformat" statements for each user-defined feature, not
anything approaching the "learn by example" method.  One exists for the
Logitech Modula II compiler (and is included with the package), which takes
one sample file, modified by the user, and "learns" from that.  Ones for
C are probably on the horizon.

So, sorry for the long-winded discussion, but again, there seemed to be 
enough interest for a posting.  I'd appreciate hearing any comments, sugg-
estions, offers :-), etc.

Kirt

-------------------------------------------------------------------------------
 Kirt Alan Winter                                        winter@hpldola.hp.com
 Hewlett Packard - EDD                                   (719) 590-5974
 Colorado Springs, Colorado
-------------------------------------------------------------------------------
 I had these ideas and opinions before I came to HP.  HP has to find its own. 
-------------------------------------------------------------------------------

seibel@cgl.ucsf.edu (George Seibel) (05/26/88)

In article <1420008@hpldola.HP.COM> winter@hpldola.HP.COM (Kirt Winter) writes:
[...much good stuff deleted]
>
>Flexible prettyprinters are avalable for some languages.  One exists for
>LISP, but uses "deformat" statements for each user-defined feature, not
>anything approaching the "learn by example" method.  One exists for the
>Logitech Modula II compiler (and is included with the package), which takes
>one sample file, modified by the user, and "learns" from that.  Ones for
>C are probably on the horizon.

Here's one for FORTRAN. (yeah, I know, why would a FoRtRaN programmer
be reading comp.software-eng?!)  Pretty-printing is one of the functions
of TOOLPACK, along with a macro processor, global inter-procedural analysis,
coverage analysis, and other nice things.  It doesn't use the "learn by
example" model, rather it uses a "configuration file" scheme - which
seems a lot simpler to me.  Each user might customize their own config
file to produce whatever whacko style they liked.  

TOOLPACK is public domain software developed at the U of Colorado,
Argonne National Labs, JPL, Bell Labs, Purdue, Numerical Algorithms Group,
and other sites.   

For distribution, contact Integrated Systems Technologies at (312)869-
7820 or the Numerical Algorithms Group (usa 312-971-2337 or uk oxford
(0865)511245 international +44865 511245).

also see refs IEEE Transactions on Software Engineering SE-9
                 #6 Nov 83 P 673.  L.J. Osterweil
              ACM Transactions on Mathematical Software vol 12
                 p324-353 Cowell,Thompson
              Unix Shell scripts to invoke a set of Toolpack/1 tools
                 Technical Memorandum ANL/MCS-TM-77 ANL,86.

George Seibel, UCSF  seibel@cgl.ucsf.edu

winter@hpldola.HP.COM (Kirt Winter) (05/31/88)

seibel@cgl.ucsf.edu (George Seibel) writes:

>Here's one for FORTRAN. (yeah, I know, why would a FoRtRaN programmer
>be reading comp.software-eng?!)  Pretty-printing is one of the functions
>of TOOLPACK, along with a macro processor, global inter-procedural analysis,
>coverage analysis, and other nice things.  It doesn't use the "learn by
>example" model, rather it uses a "configuration file" scheme - which
>seems a lot simpler to me.  Each user might customize their own config
>file to produce whatever whacko style they liked.  

When you say "seems a lot simpler" I wonder if I made myself clear in my
original posting.  If you mean simpler to implement the prettyprinter, then
you are certainly correct.  However, if you mean simpler to the user, I must
disagree.

I have included a "dummy" Pascal source file that will capture all style
features that the prettyprinter understands.  The user can change that to
reflect his/her own style.  This requires no knowledge of how the pretty-
printer works, unlike a configuration file (if I understand what you mean by
a configuration file).  I feel that the "learn by example" method is superior
in virtually all respects.

As far as a FORTRAN programmer reading comp.software-eng, I think congrat-
ulations are in order.  Perhaps your insight will catch on.  :-) :-) :-)

What was the quote I heard?   Something like... "I don't know what language
people will be programming in at the turn of the century, but it will be called
FORTRAN."

Kirt

-------------------------------------------------------------------------------
 Kirt Alan Winter                                        winter@hpldola.hp.com
 Hewlett Packard - EDD                                   (719) 590-5974
 Colorado Springs, Colorado
-------------------------------------------------------------------------------
 If anyone but me wants to claim these ideas, they are more than welcome to.
-------------------------------------------------------------------------------