[comp.lang.c] ANSI C to K&R converter written in Icon

alanf@bruce.cs.monash.OZ.AU (Alan Grant Finlay) (01/13/91)

After wasting my time trying to fix up a converter to work for my
C sources I decided to write my own.  Icon seems to be the ideal language
for this job (provided you have a compiler/interpreter).  I originally
thought I would do the job properly (i.e. using a C grammar) but after some
reflection I was soon put off (C's grammar is truly awful).  The result is
yet another a converter that works for the author's programs.  However one
advantage of this converter is that the algorithm is quite easy to follow
and could be easily adapted by Icon programmers to handle a greater subset of
C.  

The program has the following limitations:
   1) function prototypes are recognised by the sequence ");" with no 
intervening spaces, newlines or comments.
   2) function prototypes may not contain comments within the parameter list.
   3) function definitions may have comments, spaces and newlines within the
parameter list however the output will win no awards for legibility.
   4) the presence of function parameters in a function definition will mess up
the conversion of that parameter list (usually just removes the parameters).
There is a workaround as demonstrated by the following example:
      #ifdef ANSI
      void addts(void (*ts)())
      {
      #else
      void addts(ts)
         void (*ts)();
      {
      #endif
      <the function body>
      #ifdef ANSI
      }
      #else
      }
      #endif
This workaround requires that the non ANSI compiler ignores text which is
excluded by "#ifdef"s. 

The algorithm is to divide the source text into a stream of substrings labelled
as either "considered"  or "ignored".  A pipeline is set up to process the
stream as follows:
   control lines ->      {all lines beginning with # are ignored}
   comments ->           {comments are ignored}
   brackets ->           {everything within curly brackets is ignored}
   declarations ->       {ignore all except function declarations (top level)}
   compress ->           {joins together consecutive segments of same label}
   prototypes ->         {prototype declarations have the parameters removed}
   compress ->           {joins together consecutive segments of same label} 
   parameter lists ->    {function definition parameters are converted to K&R}
   compress ->           {joins together consecutive segments of same label}

The workaround in (4) above can now be seen to depend upon the brackets step.

The source follows next:

---<cut here>----------------------//----------------------------------------
# Program to convert C programs with ansi style function prototypes to the
# equivalent K&R form.  Only top level declarations are converted.
# Written 7/1/91 by Alan Finlay, Computer Science, Monash University. 
# 
record ignored(body)        # A program is processed as a sequence of 
record considered(body)     # ignored and considered parts.
global idset                # identifier characters
global nidset               # skip these to find next identifier
global spcset               # white space characters

procedure main()
   idset:= &lcase ++ &ucase ++ '0123456789_'
   nidset:= ~idset 
   spcset:= ' \n\t\r'
   every text:= compress(parms) do writes(text.body)
end

procedure parms() 
   # rearrange ansi style parameters to suit K&R syntax.
   par:= "false"   # not doing parameters now
   parlist:= []    # parameter list is empty
   currpar:= ""    # Current parameter is bare
   every x:= compress(protos) do
      if type(x)=="ignored" then suspend x
      else x.body ? while not pos(0) do
                       if par=="false" then 
                          if text:= tab(find("()"))||move(2) then 
                             suspend considered(text) 
                          else {
                             if text:= tab(find("("))||move(1) then par:= "true"
                             else text:= tab(0)
                             suspend considered(text)
                             }
                       else { 
                          if text:= tab(upto(',)')) then {
                             currpar||:=text; 
                             # check for (void)
                             void:= "false"
                             currpar ? if (tab(many(spcset))|0) & ="void" &
                                          (tab(many(spcset))|0) & pos(0) then
                                          void:= "true"
                             if void=="true" & *parlist=0 & 
                                &subject[&pos]==")" then {
                                currpar:= ""
                                par:= "false"
                                }
                             else {
                                # end of a parameter, extract the identifier
                                currpar ? {
                                   if any(nidset) then tab(i:= many(nidset))
                                   while tab(many(idset)) & (k:= i) &
                                      tab(i:= many(nidset)) 
                                   }
                                # update parlist and output the identifier
###<should not be needed>       /i:= 1; /k:= 1  # for strange parameters only
                                if i=*currpar+1 then i:= k 
                                j:= (currpar ? many(spcset)) | 1
                                put(parlist,"   "||currpar[j:0]||";\n")
                                currpar||:= move(1)
                                suspend considered(currpar[i:0])
                                currpar:=""
                                if &subject[&pos-1]==")" then {
                                   # can release the saved parameters
                                   suspend considered("\n")
                                   suspend considered(!parlist)
                                   parlist:= []
                                   par:= "false"
                                   }
                                }
                             }
                          else {
                             text:= tab(0) # the parameter continues
                             currpar||:= text
                             }
                          }
end


procedure protos() 
   # remove parameter types from prototypes.
   # only recognises prototypes which end with ");" as prototypes.
   # only works for prototypes which are not interrupted by comments etc.
   # must be compressed afterwards for parms to work.
   every x:= compress(decs) do
      if type(x)=="ignored" then suspend x
      else x.body ? while not pos(0) do
                       if text:= tab(upto('('))||move(1) then {
                          if not tab(find(");")) then text||:= tab(0)
                          suspend considered(text)
                          }
                       else {
                          text:= tab(0)
                          suspend considered(text)
                          }
end

procedure compress(seq) 
   # joins together adjacent text.
   textc:= ""; texti:= ""
   every x:= seq() do
      if type(x)=="ignored" then { # save ignored and expel considered
         texti||:= x.body
         if *textc~=0 then suspend considered(textc)
         textc:= ""
         }
      else {                      # save considered and expel ignored
         textc||:= x.body
         if *texti~=0 then suspend ignored(texti)
         texti:= ""
         }
   if textc~=="" then return considered(textc)    # only one of these
   if texti~=="" then return ignored(texti)       # can apply.
end

procedure decs() 
   # remove top level data declarations.
   dec:= "false"   # not in a declaration now
   every x:= brackets() do
      if type(x)=="ignored" then suspend x
      else x.body ? while not pos(0) do
                       if dec=="false" then {
                          if text:= tab(find("typedef" | "auto" | "static" |
                             "extern" | "register" )) then dec:= "true"
                          else text:= tab(0)
                          suspend considered(text)
                          }
                       else { 
                          if text:= tab(find(";"))||move(1) then dec:= "false"
                          else text:= tab(0)
                          suspend ignored(text)
                          }
end

procedure brackets() 
   # remove any text between { and } after comments removed.
   bal:= 0   # start with balanced brackets
   every x:= comments() do 
      if type(x)=="ignored" then suspend x
      else x.body ? while not pos(0) do
                       if text:= tab(upto('{}')) then 
                          if &subject[&pos]=="{" then {
                             bal+:= 1; text||:= move(1)
                             if bal=1 then suspend considered(text)
                             else suspend ignored(text)
                             }
                          else {
                             bal-:= 1; move(1)
                             suspend ignored(text)
                             if bal=0 then suspend considered("}")
                             else suspend ignored("}") 
                             }
                       else {
                          text:= tab(0)
                          if bal=0 then suspend considered(text) 
                          else suspend ignored(text)
                          }
end

procedure comments() 
   # Read std input and remove comments.
   #  For now compiler control lines are removed here also.
   com:= "false"   # not in a comment now
   while line:= read()||"\n" do
      if line[1]=="#" then suspend ignored(line)
      else line ? while not pos(0) do 
                     if com=="false" then {
                        if text:= tab(find("/*")) then com:= "true"
                        else text:= tab(0)
                        suspend considered(text)
                        }
                     else { 
                        if text:= tab(find("*/"))||move(2) then com:= "false"
                        else text:= tab(0)
                        suspend ignored(text)
                        }
end

rfg@NCD.COM (Ron Guilmette) (01/15/91)

In article <3579@bruce.cs.monash.OZ.AU> alanf@bruce.cs.monash.OZ.AU (Alan Grant Finlay) writes:
>After wasting my time trying to fix up a converter to work for my
>C sources I decided to write my own.  Icon seems to be the ideal language
>for this job (provided you have a compiler/interpreter).  I originally
>thought I would do the job properly (i.e. using a C grammar) but after some
>reflection I was soon put off (C's grammar is truly awful).  The result is
>yet another a converter that works for the author's programs.  However one
>advantage of this converter is that the algorithm is quite easy to follow
>and could be easily adapted by Icon programmers to handle a greater subset of
>C.  

There is no reason to settle for a tool which can only cope with a subset
of the C language.  My protoize and unprotoize tools (which have been
available for quite some time now) are both able to deal with the entire
ANSI C language as well as many `traditional C' features.

Additionally, these tools are written in C which makes them highly portable
and very fast.

More information is provided below.

----------------------------------------------------------------------------


                             Protoize/Unprotoize


     This is a brief announcement concerning to two free software  tools
     called  protoize  and  unprotoize.  Protoize is a tool to assist in
     the conversion of old-style (K&R) C code to new-style ANSI  C  code
     or  C++  code  (with function prototypes).  Unprotoize is a tool to
     assist in the conversion of new-style  ANSI  C  code  to  old-style
     (K&R) C code without function prototypes.

     Neither of these tools claims to do a  complete  conversion  (there
     are too many niggling little incompatibilities) however the bulk of
     the work (usually more than 90%) in such  conversions  usually  in-
     volves  function prototypes.  This is the part of the job that pro-
     toize and unprotoize can perform automatically (leaving you to con-
     tend only with the remaining niggling details).

     The protoize and unprotoize tools have been built specifically  for
     doing  mass  conversions  on large systems of C source code.  Thus,
     both protoize and unprotoize are able to deal effectively  with  an
     entire group of source files during each individual run.

     Most importantly, protoize can use  information  gleaned  from  one
     source  file to help with the conversion of other base source files
     and/or include files in the same group.  This capability is partic-
     ularly  useful when one wants one's include files to contain ANSI C
     (and/or C++) function prototypes.  Protoize is able to automatical-
     ly insert such prototypes into include files based upon information
     it gets from your base source (i.e. .c) files.  Likewise,  external
     function declarations appearing in one .c file will be converted to
     prototype form based upon information gathered from the correspond-
     ing function definitions in the same .c source file, or in other .c
     files.

     Protoize can also be used with your system's own  native  lint  li-
     braries to generate a complete set of fully prototyped "system" in-
     clude files.  Such a set can be useful for catching  more  function
     calling errors at compile time.

     Protoize and unprotoize work in conjunction with the GNU C compiler
     (GCC)  which is used as a front-end information gathering tool.  In
     order to build or use protoize or unprotoize you  must  also  build
     and use GCC.

     Version 1.07 of protoize/unprotoize  is  dramatically  better  than
     previous  versions.   Substantial  improvements  have  been made in
     robustness and ease-of-use.  If you tried  protoize/unprotoize  be-
     fore  and  didn't  like  them,  please  try them again.  You may be
     pleasantly surprized.  The 1.07 version of  protoize/unprotoize has
     been pre-tested by several people on a number of different machines
     and is believed to be quite portable and reasonably bug free.   (My
     special thanks to all the pre-testers!)

     As with prior versions,  the  distribution  file  is  a  compressed
     *patch* file (not a tar file) which should be applied to a pristine
     set of GCC Version 1.36 source files.  (The file protoize-1.07.93.Z
     is  also  available  for  those  users now pre-testing GCC 1.36.93.
     Size is 89029 bytes.  That version should also be used for GCC 1.37
     until  I  have  a chance to create another patch file just for that
     version of GCC.)

     The application of the protoize/unprotoize patches will  result  in
     the creation of several new files.  Among these "additions" are the
     file README_PROTOIZE and a common pre-man-page file  called  proto-
     unproto.1.   The latter file will be preprocessed into two man-page
     files  (called  protoize.1  and  unprotoize.1)  by  the  (modified)
     Makefile during a normal build of the (modified) GCC.

     Note that when using protoize 1.07 you may  occasionally  get  mes-
     sages like:

             please add `extern foobar()' to SYSCALLS.c

     These messages are an indication that your native "system"  include
     files  are  not  yet in fully prototyped form.  For now, you should
     just ignore these messages.  I am now  developing  a  plan  whereby
     protoize will be able to automatically create protoized versions of
     system include files for a variety of systems.   This  scheme  will
     probably make its debut in v1.08.  After that, we can all (finally)
     get totally protoized.  (This will also be a major benefit for  C++
     users.)

     Because so many things have changed in this version, it is strongly
     advised  that  you  read the README_PROTOIZE file and the man pages
     again, even if you  have  already  been  using  prior  versions  of
     protoize/unprotoize.

     As before, I welcome comments, suggestions, bug reports and  (espe-
     cially)  compliments.   User suggestions have been the major source
     of ideas for new features up till now, and I'll try to be receptive
     if you have a new idea for an additional feature.  Also, please let
     me know if you use these tools to do  a  conversion  on  any  large
     (i.e. >= 100k lines of code) system.

     Protoize, Unprotoize, and GCC are owned and operated  by  the  Free
     Software Foundation.  They are available to all under the terms and
     conditions of the GNU Public License, a copy of which  is  provided
     with the source code for GCC.


                             U. S. Availability

     Protoize/unprotoize version 1.07 is  available  via  anonymous  FTP
     from ics.uci.edu.  Size of the protoize-1.07.Z file is 89135 bytes.

     Protoize/unprotoize version 1.07 can also be obtained via anonymous
     UUCP  from  osu-cis.  (Contact for UUCP transfer is Karl Kleinpaste
     <karl@cis.ohio-state.edu>).

     On ics.uci.edu, protoize/unprotoize 1.07 can be found as:

             ~ftp/gnu/protoize-1.07.Z
             ~ftp/gnu/protoize-1.07.93.Z


                            European Availability

     Two sites are distributing protoize/unprotoize version 1.07 in  Eu-
     rope.

     Protoize/unprotoize version 1.07 can be obtained via anonymous  FTP
     from   mizar.docs.uu.se  (130.238.4.1).   Contact  is  Ove  Ewerlid
     <ewerlid@mizar.docs.uu.se>.  The files are located in:

                     ~ftp/pub/gnu/protoize-1.07.Z
                     ~ftp/pub/gnu/protoize-1.07.93.Z

     (Thanks Ove!)

     Rijks Universiteit Utrecht (Utrecht University, Department of  Com-
     puter  Science)  is  also  making  protoize/unprotoize version 1.07
     available in Europe.  Protoize/unprotoize version 1.07 may  be  ob-
     tained from Utrecht University either via anonymous FTP or by mail.
     Instructions for each of these follow.

     Anonymous FTP:

             System: sol.cs.ruu.nl [131.211.80.5]
             Files:  ~ftp/pub/GNU/protoize-1.07.Z
                     ~ftp/pub/GNU/protoize-1.07.93.Z

     Mail Server:

          European  sites   not   having   FTP   access   may   retrieve
          protoize/unprotoize  version  1.07 from the Rijks Universiteit
          Utrecht by sending an email message to <mail-server@cs.ruu.nl>
          with the following contents:

          path <your_valid_return_address>
          btoa
          send GNU/protoize-1.07.Z
          end

          Leave out the line  with  "btoa"  if  you  prefer  uuencoding.
          Please  use  a  domain-based  return address, or you may loose
          out.

     My  thanks  go  to  Edwin  Kremer  <edwin@cs.ruu.nl>   for   making
     Protoize/Unprotoize 1.07 available in the Netherlands.

-- 

// Ron Guilmette  -  C++ Entomologist
// Internet: rfg@ncd.com      uucp: ...uunet!lupine!rfg
// Motto:  If it sticks, force it.  If it breaks, it needed replacing anyway.