[comp.sources.unix] v10i069: Pascal to C translator, Part05/12

rs@uunet.UU.NET (Rich Salz) (07/28/87)

Submitted-by: Per Bergsten <mcvax!enea!chalmers!holtec!perb>
Posting-number: Volume 10, Issue 69
Archive-name: ptoc/Part05

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 5 (of 12)."
# Contents:  ptoc.doc
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'ptoc.doc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'ptoc.doc'\"
else
echo shar: Extracting \"'ptoc.doc'\" \(35411 characters\)
sed "s/^X//" >'ptoc.doc' <<'END_OF_FILE'
X.\"			@(#)doc.ms	1.6 Date 87/06/29
X.if t .rm CM
X.ds LH Holistic Technology AB
X.ds CH PTC
X.ds RH HD870410-1 Rev: 1.6
X.ds LF
X.ds CF
X.ds RF
X.nr LL 6.5i
X.nr PO 1i
X.nr HM 0.75i
X.nr FM 1.5i
X.ie t .pl 842p
X.el   .pl 72v
X.ch BT -\n(FMu/2u
X.\" 3 tabs/inch at 12 cpi corresponds to 4 chars/tab
X.nr t 1i/3u
X.am DS
X.ta \ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu +\ntu
X..
X.LP
X.rs
X.sp |8c
X.nf
X.ce 10
X.if t .ps +6
X.B "PTC implementation note"
X.if t .ps -6
X
Xby
X
XPer Bergsten
X
XHolistic Technology AB
XGrona Gatan 59
X414 54 Gothenburg
XSweden
X.ce 0
X.sp 12
X.LP
XThis note describes the implementation of ptc, a Pascal to C translator.
XThe program was developed by Per Bergsten of Holistic Technology AB,
XGothenburg, Sweden.
XThe paper is intended to provide a guide for those who need to transport
Xptc to a new environment,
Xit describes how Pascal constructs are mapped onto C constructs.
X.bp
X.NH S 1
XBackground
X.LP
X.ds CF - % -
X.nr % 1
XThe aim was originally to provide a simple tool for transporting finished
Xapplications to systems lacking Pascal compilers.
XThe first versions,
Xconstructed in about 1984,
Xwere capable of translating simple Pascal programs.
XIt was never intended to become a released product,
Xhowever,
Xas time went by, programs and ambitions grew to the point where nearly
Xthe full Pascal language was supported.
XThus the program as it stands today has a long ancestry
Xbut it has never been redesigned (which it should have been).
X.NH 1
XPascal vs C
X.LP
XTo anyone familiar with the two languages it is obvious that
Xthey are very similar in structure.
XThe major features may be summarised as follows:
X.TS
Xc c
Xl l .
XPascal	C
X.sp
XBlock-structured	Block-structured
X- multiple declaration levels	- single declaration level
XStatically scoped	Statically scoped
XStrongly typed	Weakly typed
X	- allows unbounded pointers
XCall by value	Mostly call by value
X- and call by reference	
XHighly independent	Highly integrated
X- of host system	- with system
XSelf contained	Allows external definitions.
X.TE
X.LP
XOn the syntactic level the languages differ only in minor ways,
Xmostly in the order in which keywords and other symbols occur,
Xand of course in that the languages uses different symbols for the same
Xpurposes.
XThe only complication is that C prohibits nested subroutine declarations.
X.LP
XOn the semantic level the situation is worse.
XC has the peculiarity that array variables are treated differently from other
Xvariables,
Xthis forces us to adopt some general way to handle array variables.
XFurthermore, since Pascal offers nested subroutine declarations
Xit becomes necessary to simulate part of the activation record mechanism
Xin the translated code,
Xin one case it is extremely difficult to completely do this.
XIt is also clear that the C typedef mechanism has some shortcomings that
Xpreclude an easy translation of Pascal types.
X.bp
X.NH 1
XMapping Pascal to C
X.LP
XIn this part of the paper we briefly illustrate how to translate
XPascal code into equivalent C code.
X.NH 2
XPrograms
X.LP
XA minimal Pascal program:
X.DS
Xprogram p;
Xbegin
Xend.
X.DE
Xtranslates to the C equivalent:
X.DS
Xextern void exit(\^);
Xmain(\^)
X{
X        exit(0);
X}
X.DE
X.LP
XIt should be noted here that
Xthe translator does not actually require a complete Pascal program,
Xthe implementation is such that any consistent set of declarations can
Xbe translated.
X.NH 2
XLexical issues
X.LP
XThe C language uses ASCII as a carrier,
Xalmost all of the availible characters are used.
XThe Pascal definition uses a smaller set of characters.
XSince few features of the languages depend on the underlying character set
Xthis does not introduce much difficulties.
X.LP
XOne serious problem does occur.
XBoth language definitions states that comments have no meaning and no
Xclear place in the language syntax.
XFurthermore, the Pascal definition states that a comment is equivalent
Xto a blank character.
XThis implies that if comments are handled accurately
Xthe translator should also be able to collect and classify each
Xblank character in a Pascal program and to generate a C program with the
Xsame number of blank characters in the corresponding positions.
XThis implication conflicts with the fact that the languages have different
Xsyntax rules, it is not obvious what the "corresponding positions" would be.
X.LP
XSince comments have no defined meaning a user is free to interpret them
Xin any way and, in particular, to associate them with the surrounding code
Xin any way s/he chooses.
XAlthough humans usually are able to deduce what bearing a comment has on the
Xsurrounding program code there are no formal rules for how to do this.
XTherefore there is no
X.I "a priori"
Xcorrect way to translate comments
Xand the translator described here ignores comments altogether.
XIf/when a reasonable
X.I "ad hoc"
Xsolution is found that feature may be incorporated.
X.NH 2
XDeclarations
X.LP
XThe program may introduce local declarations which are handled as follows.
X.bp
X.NH 3
XLabels
X.LP
X.DS
Xprogram p;
X
Xlabel	9;
X
Xbegin
X9:
X        goto 9
Xend.
X.DE
Xwhich we simply translate into:
X.DS
Xextern void exit(\^);
Xmain(\^)
X{
XL9:
X        goto L9;
X        exit(0);
X}
X.DE
X'ne 12
X.LP
XIf the label is reached from an inner procedure:
X.DS
Xprogram p;
X
Xlabel	100;
X
X        procedure q;
X
X        begin
X                goto 100
X        end;
X
Xbegin
X100:
Xend.
X.DE
Xa more complicated translation must be used:
X.DS
X# define Line __LINE__
Xvoid Caseerror(\^);
X# include <setjmp.h>
Xstatic struct Jb { jmp_buf jb; } J[1];
X
X void
Xq(\^)
X{
X        longjmp(J[0].jb, 100);
X}
X
Xmain(\^)
X{
X        if (setjmp(J[0].jb))
X                goto L100;
XL100:
X        exit(0);
X}
X.DE
X.LP
XWe assume the existence of the standard
X.I setjmp(\^)
Xand
X.I longjmp(\^)
Xlibrary functions.
XJumpbuffers are statically allocated as needed depending on the number
Xof declarationlevels in the program.
X.NH 3
XConstants
X.LP
XConstant declarations are treated in two different ways.
XNumbers aliases etc are simply
X.I "# define" 'd
Xbut string constants are converted to static character arrays in order
Xto avoid unnecessary duplication of string-constants in the object code,
Xthus:
X.DS
Xconst
X	p	= 3.14;
X	pie	= '3.14';
X.DE
Xtranslates to:
X.DS
X# define pi 3.14
Xstatic char pie[\^] = "3.14";
X.DE
X.NH 3
XTypes and variables
X.LP
XTypes and variables are mostly easy to translate:
X.DS
Xprogram p;
X
Xconst length	= 15;
X
Xtype
X	struct		= 0 .. length;
X	vect		= array [ struct ] of struct;
X	color		= (red, blue, ada, yellow);
X	pointer	= ^ recd;
X	recd		= record
X				r	: pointer;
X				case b : boolean of
X				  false:	(c : color);
X				  true:	(v : vect);
X			  end;
X
Xvar	SP		: pointer;
X
Xbegin
X	new(SP, true);
Xend.
X.DE
Xbecomes
X.DS
Xtypedef char    boolean;
X# define false (boolean)0
X# define true (boolean)1
Xextern char *Bools[\^];
X# define Unionoffs(p, m) (((long)(&(p)->m))-((long)(p)))
Xextern char *malloc(\^);
X# define length 15
Xtypedef unsigned char C47_struct;
Xtypedef struct { C47_struct A[length + 1]; } vect;
Xtypedef enum { red, blue, ada, yellow } color;
Xtypedef struct S57 *pointer;
Xtypedef struct S57 {
X	pointer	r;
X	boolean	b;
X	union {
X		struct {
X			color    c;
X		} V1;
X		struct  {
X			vect	v;
X		} V2;
X	} U;
X}	recd;
Xpointer	sp;
X
Xmain(\^)
X{
X	sp = (struct S57 *)malloc((unsigned)
X		(Unionoffs(sp, U.V2.v) + sizeof(sp->U.V2)));
X	exit(0);
X}
X.DE
XThe rationale is as follows:
X.LP
XIdentifiers in the Pascal program which conflicts with reserved words in C are
Xrenamed by adding a unique prefix Cnnn where nnn is a number.
X.LP
XWe also note here that uppercase letters in identifiers and keywords
Xare translated to lowercase.
XPascal specifies that upper/lower case is insignificant whereas C
X(for the present) separates the two.
XThis fact is used to advantage by the translator as all
Xsubroutines and macros defined by the translator have an initial uppercase
Xletter which prevents confusion.
X.IP \-
XThe type
X.I boolean
Xis a predefined Pascal type,
Xwhen it is used the translator emits code which defines boolean to
Xbe the smallest convenient type:
X.I char .
XThe constants
X.I false
Xand
X.I true
Xare defined and the vector
X.I Bools
Xwill contain textstrings for output if needed.
X.IP \-
XThe predefined types
X.I integer
Xand
X.I real
Xare likewise mapped directly onto the standard C types
X.I int
Xand
X.I double
Xthrough a typedef declaration.
X.sp
XInteger subranges are mapped onto standard C arithmetic types according to
Xa short table in the translator.
XThe table is scanned from top to bottom until an enclosing range is found
Xand the corresponding type-name is emitted.
X.IP \-
XC-arrays have peculiar semantix.
XTo unify the treatment of arrays and other datatypes we always encapsulate
Xan array in a struct,
Xthus an array always becomes a
X.I struct
Xwith a single member named A.
X.IP \-
XRecords and their variants are translated into C
X.I struct
Xand
X.I union
Xdefinitions.
XSince C requires all union members to have a name we must supply artificial
Xnames for all record variants.
XA record with variants will therefore always contain one field with the name U
Xwhich have sub-fields named Vnnn where nnn is a positive number.
X.sp
X'ne 12
XWhen allocating dynamic storage for a record with variants naming
Xthe desired variant
X.DS
Xnew(sp, true)
X.DE
Xwe face the problem of determining the amount of memory needed.
X.QP
X.B
XC does not provide a safe way
Xto compute the size of a particular struct variant.
X.IP
XThe strategy adopted to cope with this problem is to attempt to compute the
Xoffset of a fieldmember in the variant matching the constant and then
Xto add the size of the variant.
XThe offset computation is expressed as a macro,
X.I Unionoffs ,
Xwhich uses rather foul typecasting to achive the result.
XThe only availible alternative would be to use the same size of all variants.
XThis method,
Xwhile being safe,
Xwastes memory when many small and few large
Xrecords of the same type are created dynamically.
X.IP \-
XPascal enumeration types are converted directly to C
X.I enum
Xtypes.
X.IP \-
XPascal pointer types are translated into C pointer types.
XPascal allows the programmer to construct recursive types as pointer
Xtypes need not be fully defined until the end of the declaration-section
Xwhere the pointer type is used.
XIn practice this is only used to introduce record types which
Xcontain pointers to themselves.
XThis problem is partially solved by introducing a name for the record type.
X.ne 10
XHence
X.DS
Xtype
X	ptr	= ^ node;
X	node	= record
X			next	: ptr;
X			info	: integer
X		  end;
X.DE
Xbecomes
X.DS
Xtypedef struct S56 * ptr;
Xtypedef struct S56 {
X	ptr		next;
X	integer		info;
X} node;
X.DE
Xwe note in passing that the problem cannot be completely solved since
X.DS
Xtype	pureptr	= ^ pureptr;
X.DE
Xwhich is valid Pascal, cannot be expressed in C.
X.ne 10v
X.IP \-
XA pascal set-type does not have any direct counterpart in C.
XThe C language does however have a adequate set of operators for
Xbit manipulation.
XWe use these to implement a Pascal set as an array of
X.I setword .
XSo:
X.DS
Xtype
X	s	= set of 1 .. 100;
X
Xvar
X	ss	: s;
X.DE
Xis translated into:
X.DS
Xtypedef unsigned short setword;
Xtypedef struct { setword S[8]; } s;
X
Xs	ss;
X.DE
XThe situation is slightly complicated by the fact that Pascal has
Xa set constructor which permits the construction of arbitrary large sets,
Xfor example:
X.DS
Xs := [ 50 .. maxint ] * [ 1 .. 80 ]
X.DE
Xfor that reason the first member in the array of words gives
Xsize of the set (in words).
XIn the example above s.S[0] would have the value 7,
Xand s.S[1] through s.S[7] would hold the bits.
XThe number 7 is computed on the assumption that the type
X.I "unsigned short"
Xon the target host is sufficiently large to holds 16 bits.
XThe set operators of Pascal are implemented as C functions returning
Xpointers to arrays of setwords,
Xthe intermediary results are held in a static area of fixed size.
X.IP \-
XPascal files are implemented using the standard i/o package availible
Xin most C implementations.
XSince Pascal has the requirement that the next element of a file is visible
Xin the filebuffer before it is read,
Xand the requirement that linemarkers in textfiles are given special treatement
Xwe are forced to extend the
X.I FILE
Xtype provided in
X.I <stdio.h> .
X.ne 20
XHence:
X.DS
Xvar	f	: text;
X.DE
Xbecomes
X.DS
Xtypedef struct {
X	FILE	*fp;
X	unsigned short
X			eoln:1,
X			eof:1,
X			init:1,
X			out:1,
X			:12;
X	char	buf;
X}	text;
Xtext	f;
X.DE
Xwhere
X.I buf
Xis our filebuffer and
X.I eoln ,
X.I eof
Xand
X.I init
Xare flags giving the state of the file.
XAll Pascal i/o operations are implemented using macros that maintain the flags
Xand the buffer variable.
XThe actual reading and writing of data is deferred to the standard i/o package.
X.NH 3
XProcedures and functions
X.LP
XPascal procedures and functions are somewhat difficult to translate to C.
XThe main problems lie in nested declarations and procedural parameters.
XNested declarations are handled in the following manner:
X.RS
X.IP \-
XIf procedure B is declared in procedure A,
Xthen procedure B is emitted before A and A is forward declared before B.
X.IP \-
XAny constants and types declared in A is moved to the global scope,
Xthis may force renaming of those objects.
X.IP \-
XAny variable declared in A
X.I
Xand used in B
X.R
Xis converted to a pointer and moved to the global scope.
XThe pointer value is saved and re-initialized upon each entry of A
Xand restored upon exit from A.
X.RE
X.br
X'ne 20
X.LP
XHence:
X.DS
Xprocedure A;
X
Xconst	limit	= 20;
X
Xtype	loctyp	= 0 .. limit;
X
Xvar	i, j	: loctyp;
X
X	procedure B(k : loctyp);
X
X	begin
X		j := k + 2
X	end;
X
Xbegin
X	B(i)
Xend;
X.DE
Xbecomes
X.DS
Xtypedef unsigned char	loctyp;
Xloctyp	*G56_j;
X
Xvoid a(\^);
X
X void
Xb(k)
X	loctyp  k;
X{
X	(*G56_j) = k + 2;
X}
X
X void
Xa(\^)
X{
X# define limit 20
X	loctyp  i, j;
X	loctyp  *F57;
X
X	F57 = G56_j;
X	G56_j = &j;
X	b(i);
X	G56_j = F57;
X}
X.DE
Xwe see that references to
X.I j
Xinside procedure
X.I b
Xare replaced by the pointer
X.I G56_j
Xwhich points to the local variable
X.I j
Xin procedure
X.I a.
XIn order to preserve the proper semantix in the face of recursion
Xthe value of the pointer need also be saved in the local variable
X.I F57
Xduring the invocation of
X.I a .
X.IP \-
XProcedure parameters offer little problems.
XWe translate Pascal value-parameters into ordinary C parameters
Xand Pascal var-parameters are treated as pointers.
X.br
X'ne 20
X.IP \-
XProcedural parameters appear at first to be easy to handle.
XThe ordinary program:
X.DS
Xprogram p;
X
Xprocedure pp(procedure q(i : integer));
X
Xbegin
X	q(2)
Xend;
X
Xprocedure qq(i : integer);
Xbegin
Xend;
X
Xbegin
X	pp(qq)
Xend.
X.DE
Xbecomes
X.DS
Xextern void	exit(\^);
Xtypedef int	integer;
X
X void
Xpp(q)
X	void	(*q)(\^);
X{
X	(*q)(2);
X}
X
X void
Xqq(i)
X	integer i;
X{
X}
X
Xmain(\^)
X{
X	pp(qq);
X	exit(0);
X}
X.DE
Xwhich looks simple enough.
X.br
XHowever,
XPascal requires that the scope of a procedure
X.I "remains unchanged"
Xthroughout its lifetime.
X.ne 35
XConsider:
X.DS
Xprogram demo(output);
X
Xvar	i	: integer;
X
X	procedure p(procedure q);
X
X	var	j	: integer;
X
X		procedure qq;
X
X		begin
X			writeln(j)
X		end;
X
X	begin
X		j := i;
X		q;
X		if i < 1 then
X		  begin
X			i := i + 1;
X			p(qq)
X		  end
X	end;
X
X	procedure dummy;
X	begin
X	end;
X
Xbegin
X	i := 0;
X	p(dummy)
Xend.
X.DE
XWhen
X.I p
Xis first invoked it assigns the local variable
X.I j
Xthe value 0.
XThis variable is accessible from
X.I qq
Xwhich is passed as a parameter in
Xthe recursive call of
X.I p .
XThe second invocation of
X.I p
Xthen sets
X.I its 
Xvariable
X.I j
Xto 1,
Xand calls
X.I q
Xwhich is bound to the first instance of
X.I qq ,
Xand should therefore print the number
X.I 0 .
X.B
XSadly,
Xthe code produced by the translator fails to do this.
X.R
XIt is obvious that the program above calls for a complete simulation
Xof the activation record mechanism of Pascal to work correctly.
X.RS
X.LP
XA workable but unpractical solution would be:
X.IP 1)
XWhen qq is used as parameter we call a function q1 which saves the environment
Xfor qq (i.e. the address of j) in a well known place
Xand returns a pointer to q2.
X.IP 2)
XWhen qq is later called (under the name q) the actual target will be q2
Xwhich sets up the proper environment calls qq.
X.RE
X.IP
XThe problem is that this requires a save-area for
X.I each
Xprocedural parameter which can hold the intresting
Xparts of its environment for
X.I each
Xof its invocations.
XIn the example above we need one are which holds the address of i
Xfor each instance of qq
X(say N instances, where N depends on the run-time behaviour of p).
XIt also requires a set of different procedures to perform the work of q2
X(N different procedures which sets up the environment for the N instances).
X.B
XThis requires much to much work to implement so the problem is left unsolved,
X.R
Xthis is hardly a problem in practice since humans rarely write such code
Xbut
X.B
Xit could introduce problems
X.R
Xin machine generated Pascal code.
X.IP
XIt should be noted that
Xthe translator accepts the keyword
X.B external
Xin place of the Pascal
X.B forward
Xdirective and assumes that the so declared procedure is defined elsewhere.
X.bp
X.NH 2
XStatements.
X.LP
XPascal statements are comparatively easy to translate to C.
XThe only parts that require special care are 
Xnon-local
X.I goto ,
X.I with
Xand
X.I for
Xstatements.
XThe code
X.DS
Xprogram p(output);
X
Xtype
X	msgtyp	= packed array [ 1 .. 12 ] of char;
X
Xvar
X	a	: array [ 1 .. 10 ] of
X			record
X				r	: real
X			end;
X	i	: integer;
X	ok	: boolean;
X
X	procedure fatal(m : msgtyp);
X
X	begin
X		writeln('Fatal error: ', m)
X	end;
X
Xbegin
X	while true do
X	  repeat
X		ok := false;
X		i := 1;
X		for i := i + 1 to i + 10 do
X			if i > 10 then
X				fatal(' 10 exceeded')
X			else
X			  with a[i] do
X				if r > 9.99e+38 then
X					ok := true
X				else
X					writeln(r)
X	  until ok
Xend.
X.DE
X'ne 20
Xbecomes
X.DS
Xtypedef char boolean;
X# define false (boolean)0
X# define true (boolean)1
Xtypedef int integer;
Xtypedef double real;
X
Xtypedef struct { char A[12 - 1 + 1]; } msgtyp;
Xtypedef struct { struct	S57 {
X	real	r;
X}	A[10 - 1 + 1]; }  T56;
XT56		a;
Xinteger	i;
Xboolean	done;
X
X void
Xfatal(m)
X	msgtyp	m;
X{
X	(void)fprintf(output.fp, "Fatal error: %.12s", m.A),
X					Putchr('\en', output);
X}
X
Xmain(\^)
X{
X	while (true)
X	  do {
X		done = false;
X		i = 1;
X		{
X		  integer		B1 = i + 1, B2 = i + 10;
X
X		  if (B1 <= B2)
X			for (i = B1; ; i++) {
X				if (i > 10)
X					fatal(*((msgtyp *)" 10 exceeded"));
X				else {
X					register struct	S57 *W3 = &a.A[i - 1];
X
X					if (W3->r > 9.99e+38)
X						done = true;
X					else
X						(void)fprintf(output.fp, "% 20e", W3->r),
X							Putchr('\en', output);
X				}
X				if (i == B2) break;
X			}
X		}
X	  } while (!(done));
X	exit(0);
X}
X.DE
Xfor the following reasons:
X.NH 3
XBegin
X.LP
XThe compound statements
Xare very similar in the two languages and need no further explanation.
X.NH 3
XIf
X.LP
XPascal if-statements
Xhave the same structure and semantix as C if-statments.
X.NH 3
XCase
X.LP
XPascal case-statements
Xhave the same structure and semantix as C switch-statements provided that a
X.I break
Xis always added to each entry.
X.LP
XThe translator supports a common Pascal extension
Xin that it recognizes the keyword
X.B otherwise
Xto signify a default entry in a case-statement.
X.NH 3
XLabels
X.LP
XPascal labeled statements and labels have the same structure and semantix as
XC labeled statements except that Pascal labels are numbers where C labels
Xare identifiers,
Xthis difference is solved by simply prefixing the labels with the letter
X.I L .
X.NH 3
XGoto
X.LP
XPascal goto-statements
Xhave the same structure as C goto statements but the semantix differ in
Xthat Pascal allows a goto to lead out of the current procedure.
XThis is implemented using the
X.I setjmp/longjmp
Xlibrary functions of C as described earlier.
X.NH 3
XWith
X.LP
XThe with-statement
Xof Pascal has no counterpart in C.
XIt is translated into nested compund statements where pointervariables,
Xreferencing the corresponding records,
Xare declared and initialized.
XWithin the scope of the with-statement,
Xthe accessible record fields are renamed to include the pointer.
XThe effect of this is that the record address is evaluated once as the
XPascal standard requires.
X.NH 3
XFor
X.LP
XThe for-statement
Xof Pascal has a structure that is similar to the C for-statement
Xbut the semantix differ completely.
XPascal requires that a loop be exited when the upper bound
Xhas been reached,
XPascal also requires that the loop-boundaries be evaluated exactly once.
XThe standard C for-loop is exited when the loop-condition becomes false.
XThis implies that it is not always true that
X.DS
Xfor (i = 0; i <= e; i++) ;
X.DE
Xbehaves in the same manner as
X.DS
Xfor i := 0 to e do ;
X.DE
Xsince (in most implementations)
Xthe C version becomes an infinite loop if
X.I e
Xequals
X.I maxint
Xor if
X.I e
Xis the expression
X.I i .
XFor that reason Pascal for-statments are translated into compound statements
Xwhere the upper/lower bounds of the for-loop are held in local variables.
XIt is also necessary to add a conditional break-statement at
Xthe end of the loop.
XIt is possible to obtain the more relaxed translation by configuring the
Xtranslator appropriately (see "Tuning" below).
X.NH 3
XWhile
X.LP
XThe while-statement behaves exactly the same in both languages.
X.NH 3
XRepeat
X.LP
XThe repeat-statement of Pascal matches the do-while statement of C except
Xfor the trivial difference that Pascal permits a statement-list
Xwhere C permits a single statment in the loop.
X.NH 3
XEmpty
X.LP
XThe empty statement has (of course) the same structure and semantix
Xin both languages.
X.NH 2
XExpressions and assignments
X.LP
XIn most cases Pascal expressions can be literally translated into equivalent
XC expressions.
X.IP identifiers 15
XExcept where identifiers clash with reserved words or with other
Xidentifiers declared in the same scope,
Xthey may simply be printed.
XIn some cases the translator is forced to rename identifiers or to
Xinvent new identifiers.
X.IP operators
XThe operators +, -, *
X.I div
Xand
X.I mod
Xwhen applied to real or integer operands
Xhave exact counterparts in C and are therefore easy to handle.
XThe operator for real division, /,
Xcorresponds to the same C operator except that the operands may be integers.
XIn that case a cast is necessary.
XWhen the operands are sets the expression is converted into a function call.
X.sp
XThe operators <, <=, >, >=, = and <> all have exact counterparts in C
Xfor integer and real operands.
XMost C implementations disallows
X.I enum
Xoperands,
Xthe translator therefore casts such operands to
X.I int .
XComparisons on structures (i.e. strings or sets)
Xare converted into function calls.
X.IP assignments
XAssignments are straightforward to handle since arrays are encapsulated
Xin structures.
XTherefore:
X.DS
Xa := b
X.DE
Xbecomes
X.DS
Xa = b
X.DE
X.I unless
Xb is a string or a set,
Xin which case the assignment is converted into a function call.
X.IP indexing
XGiven the translation for array declarations (described above) we are forced
Xto translate
X.DS
Xa[i]
X.DE
Xinto
X.DS
Xa.A[i - c]
X.DE
Xwhere
X.I c
Xis the lower bound for the index type.
X.IP selection
XGiven the translation for records with variants (described above) the
Xtranslation of
X.DS
Xa.b
X.DE
Xbecomes
X.DS
Xa.b
X.DE
X.I or ,
Xif b is declared in a variant,
X.DS
Xa.Vxxx.b
X.DE
Xwhere Vxxx is a name manufactured by the translator for the particular variant.
X.IP dereferencing
XPointer references and
X.B var -parameters
Xare handled by prefixing the expression with an asterisk,
Xbut the special case dereferencing followed by selection is also recognized,
Xso:
X.DS
Xp^ := q^.f
X.DE
Xbecomes
X.DS
X*p = q->f
X.DE
X.IP miscellanea
XThe boolean operators
X.B and ,
X.B or
Xand
X.B not
Xare simply translated into their C counterparts.
XThe set contructors
X.B "[\^]" ,
Xand
X.B ".."
Xand the operator
X.B in
Xare converted to function calls.
X.bp
X.NH 1
XImplementation
X.LP
XThe general strategy is to convert the Pascal source program
Xinto a parsetree.
XThe tree is then traversed by a set of procedures that perform some
Xnecessary transformations.
XThe tree is finally traversed by a set of procedures that print the
Xcorresponding C constructs.
XThe translator consists of three major procedures that perform
Xthese actions.
XThey are augmented by a set of procedures that maintain a symbol table
Xthat holds information about identifiers found in the source,
Xand by a procedure that initializes all internal datastructures.
X.LP
XThere are three major datastructures that interact in complicated ways:
X.IP 1)
Xa store for identifiers and strings
X.IP 2)
Xa multilevel symbol table
X.IP 3)
Xa parse-tree.
X.LP
XThe identifier and string store,
X.B strstor
Xis in principle an array of characters that grow
Xin increments of one string block.
XExactly one copy of each identifier is stored in that array.
XWhenever an identifier is found in the input stream the array is
Xscanned to see if that identifier has been seen before.
XIn order to speed up the search all identifiers are represented by nodes
Xwhich are chained together such that all nodes in a particular chain
Xhave the same hashvalue as determined by the function
X.B hash .
XEach
X.B idnode
Xholds an index to strstor where the corresponding identifier text is stored.
XThe start of the hashchains are found in the global variable
X.B idtab .
X.de Ds
X.nr x \\w'\\$2'u
X.ie t \{
X.nr x \\nx/2
X.ds \\$1 \\\\h'-\\nxu'\\$2\\\\h'-\\nxu'
X.\}
X.el .ds \\$1 \\$2\\\\h'-\\nxu'
X..
X.ds l \-
X.ie t .ds a \z\*l\a
X.el   .ds a \a\z\*l
X.nr x \ntu/2
X.ds g \z\*l\\l'\nxu\&\*l'
X.ds h \\h'\nxu'
X.Ds c +
X.Ds d \(da
X.Ds u \(ua
X.Ds r \(br
X.ds s \\*r\z\*l
X.ds > \*l\z\*l\(->
X.ds < \(<-\\h'-1n'\*l
X.ds k \\kx
X.ds b \\h'|\\nxu'
X.ds t \t
X.DS L
X.lc \*l
X.fc #
Xidtab
X\*c\*a\*c
X\*r\*t\*r\*tchain of idnodes with same hashvalue
X\*c\*a\*c\*t\*c\*a\*a\*c\*t\*c\*a\*a\*c
X\*k\*r\*t\*r\*b\*h\*a#\*> #\*r\*t\*k\*t\*r\*b\*h\*a#\*> #\*r\*t\*t\*r idnode representing
X\*c\*a\*c\*t\*r\*t\*t\*r\*t\*k index=2\*b\*r\*t\*t\*r identifier "demo"
X\*r\*t\*r\*t\*c\*a\*a\*c\*t\*c\*a\*a\*c
X
X\*tstrstor
X\*c\*a\*a\*a\ \*l \*a\*a\*c
X\*r\*t\*r\*t\*r\*t\*t\*r\*t\*r
X\*k\*c\*a\*a\*a\ \*l \*a\*a\*c\*b\*h\*r
X\*h\*r
X\*h\*d
X\*c\*a\*a\*a\*a\*a\*a\*a\*a
X\*r\*t\*r\*t\*r# d #\*r# e #\*r# m #\*r# o #\*r# / #\*r
X\*c\*a\*a\*a\*a\*a\*a\*a\*a\*tfirst idblock
X\*t\*t# \*u #
X\*t\*tindex=2
X.DE
X.LP
XSo:
Xthe global representation of the identifier "demo" is a particlular
Xidnode;
Xwhenever the lexical analysis routine
Xrecognizes the identifier "demo" it returns a pointer to that idnode.
X.LP
XIn order to represent different identifiers with the same name we need to
Xbe able to distinguish between identifiers declared in different scopes.
XThis is accomplished by the
X.B symnode
Xstructures.
XWhen an identifier is first encountered in a given scope it is "declared",
Xmeaning that a new symnode is created that references the identifier.
XOccurrences of the same identifier in that scope are then represented in
Xthe parse-tree by treenodes referencing the same symnode.
X.bp
XThe program:
X.DS
Xprogram p;
X
Xvar	demo	: integer;
X
Xbegin
X	demo := 3
Xend.
X.DE
Xyilds the following structure:
X.DS L
X.lc \*l
X.fc #
X\*t# top #
X\*t# \*d #
X\*t\*c\*a\*a\*a\*a\*c treenode representing
X\*t\*k npgm\*b\*r\*t\*t\*t\*t\*r the program
X\*t\*c\*a\*s\*a\*s\*a\*s\*a\*c
X\*t\*t\*r\*t\*r\*h\*u\*t\*r\*h\*u
X\*t\*t\*r\*t\*r\*h\*r\*t\*r\*h\*c\*a\*a\*a\*a\*a\*a\*a\*g\*c
X\*t\*t\*r\*t\*r\*h\*r\*t\*c\*a\*a\*a\*a\*a\*a\*a\*c\*h\*r
X\*t\*t\*r\*t\*d\*h\*r\*t\*t\*t\*t\*t\*t\*t\*t\*r\*h\*r
X\*t\*t\*r\*h\*c\*a\*g\*s\*a\*a\*c\*k treenode representing\*b\*t\*t\*t\*t\*t\*t\*r\*h\*r
X\*t\*t\*r\*h\*k nvar\*b\*r\*t\*t\*t\*\*kr the var-declaration\*b\*t\*t\*t\*t\*t\*t\*d\*h\*r
X\*t\*t\*r\*h\*c\*g\*s\*a\*s\*a\*c\*t\*t\*t\*c\*a\*a\*a\*g\*s\*a\*c treenode repr.
X\*t\*t\*r\*t\*r\*h\*u\*t\*r\*t\*t\*t\*t\*k nassign\*b\*r\*t\*t\*t\*t\*r assignment
X\*t\*t\*r\*t\*r\*h\*r\*t\*c\*a\*k\*> to type\*b\*t\*t\*t\*c\*a\*s\*a\*a\*s\*a\*c
X\*ksymtab\*b\*t\*t\*r\*t\*d\*h\*r\*t\*t\*t\*t\*t\*t\*d\*h\*u\*t\*t\*d\*h\*u
X\*c\*a\*c\*t\*r\*h\*c\*a\*g\*s\*a\*c\*k treenode repr.\*b\*t\*t\*t\*t\*c\*a\*g\*s\*a\*c\*h\*c\*a\*g\*s\*a\*a\*c
X\*r\*t\*r \*<\*a\*c\*h\*k nid\*b\*r\*t\*t\*r\*k occurrence of\*b\*t\*t\*t\*t\*k nid\*b\*r\*t\*t\*r\*h\*k ninteger\*b\*r\*t\*t\*t\*r
X\*t\*t\*h\*c\*a\*s\*a\*c\*k id. "demo"\*b\*t\*t\*t\*t\*c\*a\*s\*a\*c\*h\*c\*a\*a\*s\*a\*c
X\*r\*t\*r\*t\*t\*r\*h\*u\*t\*t\*t\*t\*t\*t\*r\*t\*t\*t\*r\*h\*u
X\*c\*a\*c\*t\*t\*r\*h\*r\*t\*c\*a\*a\*a\*a\*a\*c\*t\*t\*t\*r\*h\*r
X\*r\*t\*r\*t\*t\*d\*h\*r\*t\*d\*t\*t\*t\*t\*t\*t\*t\*t\*d\*h\*r
X\*c\*a\*c\*t\*c\*a\*g\*s\*a\*a\*c\*k symnode representing\*b\*t\*t\*t\*t\*t\*t\*c\*a\*g\*s\*a\*g\*c
X\*r\*k\*t\*r\*b\*h\*a\*>\*t\*k lidentifier\*b\*r\*t\*t\*t\*r\*k identifier "demo"\*b\*t\*t\*t\*t\*t\*t\*k linteger\*b\*r\*t\*t\*h\*r
X\*c\*a\*c\*t\*c\*a\*s\*a\*a\*c\*k in the current scope\*b\*t\*t\*t\*t\*t\*t\*k linum=3\*b\*r\*t\*t\*h\*r
X\*t\*t\*t\*r\*t\*t\*t\*t\*t\*t\*t\*t\*c\*a\*a\*g\*c
X\*kidtab\*b\*t\*t\*t\*c\*a\*a\*a\*c
X\*c\*a\*c\*t\*t\*t\*t\*t\*r
X\*r\*t\*r\*t\*t\*t\*t\*t\*d
X\*c\*a\*c\*t\*c\*a\*a\*c\*t\*c\*a\*a\*c
X\*k\*r\*t\*r\*b\*h\*a#\*> #\*r\*t\*k\*t\*r\*b\*h\*a#\*> #\*r\*t\*t\*r idnode representing
X\*c\*a\*c\*t\*r\*t\*t\*r\*t\*k index=2\*b\*r\*t\*t\*r identifier "demo"
X\*r\*t\*r\*t\*c\*a\*a\*c\*t\*c\*a\*a\*c
X
X\*tstrstor
X\*c\*a\*a\*a\ \*l \*a\*a\*c
X\*r\*t\*r\*t\*r\*t\*t\*r\*t\*r
X\*c\*g\*s\*a\*a\*a\ \*l \*a\*a\*c
X\*h\*r
X\*h\*d
X\*c\*a\*a\*a\*a\*a\*a\*a\*a
X\*r\*t\*r\*t\*r# d #\*r# e #\*r# m #\*r# o #\*r# / #\*r
X\*c\*a\*a\*a\*a\*a\*a\*a\*a\*tfirst idblock
X\*t\*t# \*u #
X\*t\*tindex=2
X.fc
X.lc
X.DE
XWe see that the two occurrences of the identifier "demo" are represented by
Xtwo
X.B treenodes
Xof variant
X.B nid
Xthat both have pointers to the same
X.B symnode
Xrepresenting the declaration of "demo".
XAll symnodes at a given declarationlevel are chained together (in the
Xsame manner as the idnodes) and the chains are attached to the global
Xvariable
X.B symtab .
XIn order to find out if an identifer is declared in the current scope we
Xscan the chain of symnodes starting from symtab, and check if any of them
Xreferences the idnode we are looking for.
XA symnode also have a pointer to the treenode which defines the symbol.
XThe
X.B symtabs
Xthemselves are also chained,
Xthe chain implements a stack of declarationlevels.
XThe symtab which is created when the scope of a procedure is entered
Xis also attached to that procedure.
XWhen a procedure is entered we create a new symtab, push it onto the stack,
Xparse the procedure and pop the stack.
XThe symbols then visible are those that still can be reached from the stack.
X.LP
XThe parse-tree consists of
X.B treenodes
Xrepresenting each declaration, statement, expression etc.
XEach node except the nprogram node
Xhas a pointer to its immediate father (giving the defining point),
Xto its children (if it has any) and to its right brother (if it is
Xa member of a list).
XThe top of the tree is found in the global variable
X.B top .
X.LP
XIn order to find the defining point for the identifier in the assignment,
Xwe follow pointers from the nassign-treenode
Xto the nid-treenode, to the lidentifier-symnode,
Xand then up to the nid-treenode in the declaration.
XIf we want to know the type of the identifier
Xwe proceed up to the nvar-treenode
Xand then down to the node giving the type in the declaration
X(in our example that node would also be a nid-treenode referencing a
Xlinteger-symnode that represents the identifier "integer").
XThere is a function
X.B typeof
Xthat performs exactly this operation.
XIt will return a pointer to a npredef-, nsubrange-, nscalar-, nptr-
Xnarray-, nrecord-, nfileof- or nsetof-treenode.
XIn those cases where the translator pays attention to types
Xit uses a pointer (obtained from typeof) as representation of a type.
X.LP
XGiven the parse-tree and the layered symbol table
Xit is not hard to see how the translations described earlier are performed.
XThe one place where the code becomes more than usually complex is when a
X.B write
Xstatement with format specifications is to be translated into a call to
X.B fprintf .
X.bp
X.NH 1
XTuning
X.LP
XThe behaviour of the translator may be modified for some cases simply
Xby changing constants.
XThese constants are all located at the top of the program text.
X.IP output 12
XThe translator will copy the source during input if the constant
X.B echo
Xis true.
XThe copy is preceeded by the line
X.DS
X# ifdef PASCAL
X.DE
Xand ended by the line
X.DS
X# else
X.DE
Xand then follows the translated code
Xand finally
X.DS
X# endif
X.DE
XThis may be used to hold both Pascal and C source in the same file.
XIf the Pascal part is changed the C part may be updated through:
X.DS
X	cpp -P -DPASCAL < orig > tmp
X	ptc < tmp > new
X	move new orig
X.DE
Xassuming that
X.B echo
Xis true and that 
X.B cpp
Xis the standard C preprocessor.
X.DS
XDefault value:
X
X	echo		= false;
X.DE
X.IP comments
XThe translator recognizes both (* and { as comment delimiters.
XThey are treated as different,
Xallowing 1 level nested comments,
Xif the constant
X.B diffcom
Xis true.
X.DS
XDefault value:
X
X	diffcomm	= false;
X.DE
X.IP symbols
XThe translator accepts default entries in case-statements provided
Xthat the keyword defined through
X.B othersym
Xis used in place of the constant list.
X.DS
XDefault value:
X
X	othersym	= 'otherwise ';
X.DE
Xsubstitute for
X.DS
X	othersym	= 'otherwise%';
X.DE
Xif that feature is undesired.
X.IP
XThe translator accepts externally declared procedures and functions
Xprovided that the directive defined through
X.B externsym
Xis used in place of the keyword
X.B forward .
X.DS
XDefault value:
X
X	externsym	= 'external  ';
X.DE
X.IP sets
XSets are implemented as arrays of
X.B wordtype .
XThe type is assumed to hold
X.B "setbits + 1"
Xbits numbered from 0.
X.DS
XDefault value:
X
X	wordtype	= 'unsigned short';
X	setbits	= 15;
X.DE
X.ne 10
X.IP files
XThe implementation of files uses a flag-word which has the type given as
X.B filebits
Xwhich is assumed to hold
X.B "filefill + 4"
Xbits.
X.DS
XDefault value:
X
X	filebits	= 'unsigned short';
X	filefill	= 12;
X.DE
X.ne 20
X.IP stmts
XIf the Pascal source is known to be "normal" in the sense that for-loops
Xalways have an upper bound that is less than the maximum value of the
Xtype of the loop-variable, and in the sense that the upper bound doesn't
Xchange by computations inside the loop, then the translator may generate
Xa more natural code.
XI.e:
X.DS
Xfor i := 1 to 10 do ;
X.DE
Xbecomes
X.DS
Xfor (i = 1; i <= 10; i++) ;
X.DE
XSince the requirements cannot be determined by the translator itself
Xthis kind of code is generated when the constant
X.B lazyfor
Xis true.
X.DS
XDefault value:
X
X	lazyfor	= false;
X.DE
X.ne 10
X.IP new
XThe second and following parameters to the procedure
X.B new
Xwill be ignored if the constant
X.B unionnew
Xis false.
X.DS
XDefault value:
X
X	unionnew	= true;
X.DE
X.ne 10
X.IP strings
XAll identifiers and strings are stored in the translator with the special
Xcharacter having the ordinal value
X.B null
Xas endmarker.
XHence, that character can not occur in strings in the Pascal source.
X.DS
XDefault value:
X
X	null		= 0;
X.DE
X.ne 20
X.IP types
XThe names of predefined types are given by the constants:
X.B inttyp ,
X.B chartyp ,
X.B floattyp ,
Xand
X.B doubletyp .
X.DS
XDefault value:
X
X	inttyp		= 'int';
X	chartyp	= 'char';
X	floattyp	= 'float';
X	doubletyp	= 'double';
X.DE
XThe typename for real variables and functions defined by the user
Xis given by the constant
X.B realtyp .
X.DS
XDefault value:
X
X	realtyp	= doubletyp;
X.DE
XThe typename for procedures is given by the constant
X.B voidtyp .
X.DS
XDefault value:
X
X	voidtyp	= 'void';
X.DE
X.ne 10
X.IP i/o
XThe default fieldwidths for integer and real values written on textfiles
Xare given by the constants
X.B intlen
Xand
X.B fixlen .
X.DS
XDefault value:
X
X	intlen	= 10;
X	fixlen	= 20;
X.DE
X.IP types
XA table in the translator gives the mapping from Pascal
Xinteger subrange types to C arithmetic types.
XThe table is initialized by code located at the end of procedure
X.I initialize
Xgiving the following default configuration:
X.TS
Xl l l
Xr r l .
XLow bound	High bound	Selected type
X.sp
X0	255	unsigned char
X-128	127	char
X0	65535	unsigned short
X-32768	32767	short
X-2147483647	2147483647	long
X.TE
END_OF_FILE
if test 35411 -ne `wc -c <'ptoc.doc'`; then
    echo shar: \"'ptoc.doc'\" unpacked with wrong size!
fi
# end of 'ptoc.doc'
fi
echo shar: End of archive 5 \(of 12\).
cp /dev/null ark5isdone
MISSING=""
for I in 1 2 3 4 5 6 7 8 9 10 11 12 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 12 archives.
    rm -f ark[1-9]isdone ark[1-9][0-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0
-- 

Rich $alz			"Anger is an energy"
Cronus Project, BBN Labs	rsalz@bbn.com
Moderator, comp.sources.unix	sources@uunet.uu.net