alanf@bruce.cs.monash.OZ.AU (Alan Grant Finlay) (01/13/91)
After wasting my time trying to fix up a converter to work for my C sources I decided to write my own. Icon seems to be the ideal language for this job (provided you have a compiler/interpreter). I originally thought I would do the job properly (i.e. using a C grammar) but after some reflection I was soon put off (C's grammar is truly awful). The result is yet another a converter that works for the author's programs. However one advantage of this converter is that the algorithm is quite easy to follow and could be easily adapted by Icon programmers to handle a greater subset of C. The program has the following limitations: 1) function prototypes are recognised by the sequence ");" with no intervening spaces, newlines or comments. 2) function prototypes may not contain comments within the parameter list. 3) function definitions may have comments, spaces and newlines within the parameter list however the output will win no awards for legibility. 4) the presence of function parameters in a function definition will mess up the conversion of that parameter list (usually just removes the parameters). There is a workaround as demonstrated by the following example: #ifdef ANSI void addts(void (*ts)()) { #else void addts(ts) void (*ts)(); { #endif <the function body> #ifdef ANSI } #else } #endif This workaround requires that the non ANSI compiler ignores text which is excluded by "#ifdef"s. The algorithm is to divide the source text into a stream of substrings labelled as either "considered" or "ignored". A pipeline is set up to process the stream as follows: control lines -> {all lines beginning with # are ignored} comments -> {comments are ignored} brackets -> {everything within curly brackets is ignored} declarations -> {ignore all except function declarations (top level)} compress -> {joins together consecutive segments of same label} prototypes -> {prototype declarations have the parameters removed} compress -> {joins together consecutive segments of same label} parameter lists -> {function definition parameters are converted to K&R} compress -> {joins together consecutive segments of same label} The workaround in (4) above can now be seen to depend upon the brackets step. The source follows next: ---<cut here>----------------------//---------------------------------------- # Program to convert C programs with ansi style function prototypes to the # equivalent K&R form. Only top level declarations are converted. # Written 7/1/91 by Alan Finlay, Computer Science, Monash University. # record ignored(body) # A program is processed as a sequence of record considered(body) # ignored and considered parts. global idset # identifier characters global nidset # skip these to find next identifier global spcset # white space characters procedure main() idset:= &lcase ++ &ucase ++ '0123456789_' nidset:= ~idset spcset:= ' \n\t\r' every text:= compress(parms) do writes(text.body) end procedure parms() # rearrange ansi style parameters to suit K&R syntax. par:= "false" # not doing parameters now parlist:= [] # parameter list is empty currpar:= "" # Current parameter is bare every x:= compress(protos) do if type(x)=="ignored" then suspend x else x.body ? while not pos(0) do if par=="false" then if text:= tab(find("()"))||move(2) then suspend considered(text) else { if text:= tab(find("("))||move(1) then par:= "true" else text:= tab(0) suspend considered(text) } else { if text:= tab(upto(',)')) then { currpar||:=text; # check for (void) void:= "false" currpar ? if (tab(many(spcset))|0) & ="void" & (tab(many(spcset))|0) & pos(0) then void:= "true" if void=="true" & *parlist=0 & &subject[&pos]==")" then { currpar:= "" par:= "false" } else { # end of a parameter, extract the identifier currpar ? { if any(nidset) then tab(i:= many(nidset)) while tab(many(idset)) & (k:= i) & tab(i:= many(nidset)) } # update parlist and output the identifier ###<should not be needed> /i:= 1; /k:= 1 # for strange parameters only if i=*currpar+1 then i:= k j:= (currpar ? many(spcset)) | 1 put(parlist," "||currpar[j:0]||";\n") currpar||:= move(1) suspend considered(currpar[i:0]) currpar:="" if &subject[&pos-1]==")" then { # can release the saved parameters suspend considered("\n") suspend considered(!parlist) parlist:= [] par:= "false" } } } else { text:= tab(0) # the parameter continues currpar||:= text } } end procedure protos() # remove parameter types from prototypes. # only recognises prototypes which end with ");" as prototypes. # only works for prototypes which are not interrupted by comments etc. # must be compressed afterwards for parms to work. every x:= compress(decs) do if type(x)=="ignored" then suspend x else x.body ? while not pos(0) do if text:= tab(upto('('))||move(1) then { if not tab(find(");")) then text||:= tab(0) suspend considered(text) } else { text:= tab(0) suspend considered(text) } end procedure compress(seq) # joins together adjacent text. textc:= ""; texti:= "" every x:= seq() do if type(x)=="ignored" then { # save ignored and expel considered texti||:= x.body if *textc~=0 then suspend considered(textc) textc:= "" } else { # save considered and expel ignored textc||:= x.body if *texti~=0 then suspend ignored(texti) texti:= "" } if textc~=="" then return considered(textc) # only one of these if texti~=="" then return ignored(texti) # can apply. end procedure decs() # remove top level data declarations. dec:= "false" # not in a declaration now every x:= brackets() do if type(x)=="ignored" then suspend x else x.body ? while not pos(0) do if dec=="false" then { if text:= tab(find("typedef" | "auto" | "static" | "extern" | "register" )) then dec:= "true" else text:= tab(0) suspend considered(text) } else { if text:= tab(find(";"))||move(1) then dec:= "false" else text:= tab(0) suspend ignored(text) } end procedure brackets() # remove any text between { and } after comments removed. bal:= 0 # start with balanced brackets every x:= comments() do if type(x)=="ignored" then suspend x else x.body ? while not pos(0) do if text:= tab(upto('{}')) then if &subject[&pos]=="{" then { bal+:= 1; text||:= move(1) if bal=1 then suspend considered(text) else suspend ignored(text) } else { bal-:= 1; move(1) suspend ignored(text) if bal=0 then suspend considered("}") else suspend ignored("}") } else { text:= tab(0) if bal=0 then suspend considered(text) else suspend ignored(text) } end procedure comments() # Read std input and remove comments. # For now compiler control lines are removed here also. com:= "false" # not in a comment now while line:= read()||"\n" do if line[1]=="#" then suspend ignored(line) else line ? while not pos(0) do if com=="false" then { if text:= tab(find("/*")) then com:= "true" else text:= tab(0) suspend considered(text) } else { if text:= tab(find("*/"))||move(2) then com:= "false" else text:= tab(0) suspend ignored(text) } end
rfg@NCD.COM (Ron Guilmette) (01/15/91)
In article <3579@bruce.cs.monash.OZ.AU> alanf@bruce.cs.monash.OZ.AU (Alan Grant Finlay) writes: >After wasting my time trying to fix up a converter to work for my >C sources I decided to write my own. Icon seems to be the ideal language >for this job (provided you have a compiler/interpreter). I originally >thought I would do the job properly (i.e. using a C grammar) but after some >reflection I was soon put off (C's grammar is truly awful). The result is >yet another a converter that works for the author's programs. However one >advantage of this converter is that the algorithm is quite easy to follow >and could be easily adapted by Icon programmers to handle a greater subset of >C. There is no reason to settle for a tool which can only cope with a subset of the C language. My protoize and unprotoize tools (which have been available for quite some time now) are both able to deal with the entire ANSI C language as well as many `traditional C' features. Additionally, these tools are written in C which makes them highly portable and very fast. More information is provided below. ---------------------------------------------------------------------------- Protoize/Unprotoize This is a brief announcement concerning to two free software tools called protoize and unprotoize. Protoize is a tool to assist in the conversion of old-style (K&R) C code to new-style ANSI C code or C++ code (with function prototypes). Unprotoize is a tool to assist in the conversion of new-style ANSI C code to old-style (K&R) C code without function prototypes. Neither of these tools claims to do a complete conversion (there are too many niggling little incompatibilities) however the bulk of the work (usually more than 90%) in such conversions usually in- volves function prototypes. This is the part of the job that pro- toize and unprotoize can perform automatically (leaving you to con- tend only with the remaining niggling details). The protoize and unprotoize tools have been built specifically for doing mass conversions on large systems of C source code. Thus, both protoize and unprotoize are able to deal effectively with an entire group of source files during each individual run. Most importantly, protoize can use information gleaned from one source file to help with the conversion of other base source files and/or include files in the same group. This capability is partic- ularly useful when one wants one's include files to contain ANSI C (and/or C++) function prototypes. Protoize is able to automatical- ly insert such prototypes into include files based upon information it gets from your base source (i.e. .c) files. Likewise, external function declarations appearing in one .c file will be converted to prototype form based upon information gathered from the correspond- ing function definitions in the same .c source file, or in other .c files. Protoize can also be used with your system's own native lint li- braries to generate a complete set of fully prototyped "system" in- clude files. Such a set can be useful for catching more function calling errors at compile time. Protoize and unprotoize work in conjunction with the GNU C compiler (GCC) which is used as a front-end information gathering tool. In order to build or use protoize or unprotoize you must also build and use GCC. Version 1.07 of protoize/unprotoize is dramatically better than previous versions. Substantial improvements have been made in robustness and ease-of-use. If you tried protoize/unprotoize be- fore and didn't like them, please try them again. You may be pleasantly surprized. The 1.07 version of protoize/unprotoize has been pre-tested by several people on a number of different machines and is believed to be quite portable and reasonably bug free. (My special thanks to all the pre-testers!) As with prior versions, the distribution file is a compressed *patch* file (not a tar file) which should be applied to a pristine set of GCC Version 1.36 source files. (The file protoize-1.07.93.Z is also available for those users now pre-testing GCC 1.36.93. Size is 89029 bytes. That version should also be used for GCC 1.37 until I have a chance to create another patch file just for that version of GCC.) The application of the protoize/unprotoize patches will result in the creation of several new files. Among these "additions" are the file README_PROTOIZE and a common pre-man-page file called proto- unproto.1. The latter file will be preprocessed into two man-page files (called protoize.1 and unprotoize.1) by the (modified) Makefile during a normal build of the (modified) GCC. Note that when using protoize 1.07 you may occasionally get mes- sages like: please add `extern foobar()' to SYSCALLS.c These messages are an indication that your native "system" include files are not yet in fully prototyped form. For now, you should just ignore these messages. I am now developing a plan whereby protoize will be able to automatically create protoized versions of system include files for a variety of systems. This scheme will probably make its debut in v1.08. After that, we can all (finally) get totally protoized. (This will also be a major benefit for C++ users.) Because so many things have changed in this version, it is strongly advised that you read the README_PROTOIZE file and the man pages again, even if you have already been using prior versions of protoize/unprotoize. As before, I welcome comments, suggestions, bug reports and (espe- cially) compliments. User suggestions have been the major source of ideas for new features up till now, and I'll try to be receptive if you have a new idea for an additional feature. Also, please let me know if you use these tools to do a conversion on any large (i.e. >= 100k lines of code) system. Protoize, Unprotoize, and GCC are owned and operated by the Free Software Foundation. They are available to all under the terms and conditions of the GNU Public License, a copy of which is provided with the source code for GCC. U. S. Availability Protoize/unprotoize version 1.07 is available via anonymous FTP from ics.uci.edu. Size of the protoize-1.07.Z file is 89135 bytes. Protoize/unprotoize version 1.07 can also be obtained via anonymous UUCP from osu-cis. (Contact for UUCP transfer is Karl Kleinpaste <karl@cis.ohio-state.edu>). On ics.uci.edu, protoize/unprotoize 1.07 can be found as: ~ftp/gnu/protoize-1.07.Z ~ftp/gnu/protoize-1.07.93.Z European Availability Two sites are distributing protoize/unprotoize version 1.07 in Eu- rope. Protoize/unprotoize version 1.07 can be obtained via anonymous FTP from mizar.docs.uu.se (130.238.4.1). Contact is Ove Ewerlid <ewerlid@mizar.docs.uu.se>. The files are located in: ~ftp/pub/gnu/protoize-1.07.Z ~ftp/pub/gnu/protoize-1.07.93.Z (Thanks Ove!) Rijks Universiteit Utrecht (Utrecht University, Department of Com- puter Science) is also making protoize/unprotoize version 1.07 available in Europe. Protoize/unprotoize version 1.07 may be ob- tained from Utrecht University either via anonymous FTP or by mail. Instructions for each of these follow. Anonymous FTP: System: sol.cs.ruu.nl [131.211.80.5] Files: ~ftp/pub/GNU/protoize-1.07.Z ~ftp/pub/GNU/protoize-1.07.93.Z Mail Server: European sites not having FTP access may retrieve protoize/unprotoize version 1.07 from the Rijks Universiteit Utrecht by sending an email message to <mail-server@cs.ruu.nl> with the following contents: path <your_valid_return_address> btoa send GNU/protoize-1.07.Z end Leave out the line with "btoa" if you prefer uuencoding. Please use a domain-based return address, or you may loose out. My thanks go to Edwin Kremer <edwin@cs.ruu.nl> for making Protoize/Unprotoize 1.07 available in the Netherlands. -- // Ron Guilmette - C++ Entomologist // Internet: rfg@ncd.com uucp: ...uunet!lupine!rfg // Motto: If it sticks, force it. If it breaks, it needed replacing anyway.