cornwell@NRL-CSS.ARPA (Mark Cornwell) (03/23/88)
I'm trying to build a Modula-2 interface to a C-library an have hit a snag. Can anyone construct a suitable translation of: struct obj_name { int obj_type; union { int id; char *path; } obj_id; }; All my attempts are quite ugly and reqire introducing more extra identifiers and syntax than I would like. -- Mark Cornwell
nckurtz@ndsuvax.BITNET (Richard Kurtz) (03/25/88)
Mark, I believe the closest you could come to simulating the C union statement would be using a Variant record. I know this involves an extra identifier, but I don't know any way around this. I am currently trying to write a program that translates C procedures to Modula2 procedures and that is how I chose to translate the C union type. I am interested in your project as it might relate somewhat to what I am doing. Could you send me more information on what you are doing, or could you send your code? thanks. -Richard Kurtz -- INTERNET: NCKURTZ%NDSUVAX.BITNET@wiscvm.wisc.edu BITNET: NCKURTZ@NDSUVAX UUCP: ...psuvax1!NDSUVAX.BITNET!NCKURTZ or ...ihnp4!umn-cs!ndsuvax!nckurtz
cornwell@NRL-CSS.ARPA (Mark Cornwell) (03/26/88)
Richard, Thanks for your interest. I'm afraid our project won't be of as much interest to you as it may have appeared. We will be writing a secure message system on top of a variant of the UNIX operating system. The system will be written mostly in Modula-2. It is important that the application call the UNIX interface directly. Our modula-2 compiler did not come with an interface to UNIX, just a set of libraries for I/O, files, etc. I spent the last few days, writing a module of `glue' routines that handle the difference between the calling conventions of the Modula-2 compiler and the C-libraries so that I can write stubs in Modula to call the C-functions. These stubs have to push parameters on the run-time stacks, stash return values in registers and things like that. That is what the procedures do. In addition the interface needs a set of type declarations that coorespond to the types one would find in the header files of the UNIX system interface. It was the writing of these header files that prompted my question. In the end I found a pretty solution to the grubby coding part. I wrote some m4 macros that take a one line description of the function and its parameters and generate the linkage code, computing the proper calling sequence from the sizes of the parameters, the type of the functions, etc. It's like a mini version of a code generator as you might find in a compiler. I'm pretty proud of it. It shrinks to two pages of code what would have taken me 40 or 50 to write if I'd taken the brute-force approach. E.g., it lets me write: cgen(chown,INTEGER,path,StringPtr,owner,INTEGER,group,INTEGER) And the cgen macro will expand to: PROCEDURE chown ( path: StringPtr; owner: INTEGER; group: INTEGER ) : INTEGER; VAR adr:ADDRESS; BEGIN (* Push one word parameter: group *) SETREG(AX,group); (* ax <- amode *) CODE(50H); (* push ax *) (* Push one word parameter: owner *) SETREG(AX,owner); (* ax <- amode *) CODE(50H); (* push ax *) (* Push two word parameter: path *) adr := path SETREG(AX,adr.SEGMENT); CODE(50H); (* push ax *) SETREC(AX,adr.OFFSET); CODE(50H); (* push ax *) (* Call C-library routine: chown *) SetCLibDS; (* set DS to correct value *) EXTCALL("_chown"); (* call "_chown" *) (* Pop the 4 words pushed off the stack *) CODE(83H,C4H,08H); (* add sp,8 *) (* Move the one word function result from AX to BX *) CODE(89H,D8H); (* BX <- AX *) END chown; The macro has to be clever enough to know the sizes of the arguments and whether the function returns a value and well as the size of the return value. The calling conventions vary with respect to all of these. My implementation module contains about a hundred or so lines that are just calls to cgen. Unfortunately, all of this is just incidental to the project. It was an enjoyable distraction for the last few days and is pretty much finished now. If you are still interested, I can send you the code. --Mark
R_Tim_Coslet@cup.portal.com (03/26/88)
>I'm trying to build a Modula-2 interface to a C-library an have hit >a snag. Can anyone construct a suitable translation of: > > struct obj_name { > int obj_type; > union { > int id; > char *path; > } obj_id; > }; The direct equivalent of the above C in Modula-2 is.... ObjName : RECORD ObjType : INTEGER; CASE : BOOLEAN OF TRUE : id : INTEGER | FALSE : path : POINTER TO CHAR | END END This should creat the same data structure (and actually has one less identifier: obj_id). The key is that the TAG identifier is optional in the Variant CASE statement (alot of people never notice this!!!). I have not yet done any Modula-2 programming but I have been using Pascal for over eight years (the above is also applicable in Pascal). I verified this against Niklaus Wirth's book: Programming in Modula-2 Third, Corrected Edition Check the syntax diagrams in Appendix 4 (starting on page 189)
schaub@sugar.UUCP (Markus Schaub) (03/28/88)
> struct obj_name { | objName=RECORD > int obj_type; | objType: INTEGER; > union { | objId: RECORD > int id; | CASE (* objType *):INTEGER OF > char *path; | | 0: id: INTEGER; > } obj_id; | | 1: path: POINTER TO CHAR > }; | END > | END > | END > | > -- Mark Cornwell | -- Markus Schaub If obj_id is not used, you can simplify the RECORD to a single case-record. -- // Markus Schaub | The Modula-2 People: // M2Amiga Developer | Interface Technologies Corp. \\ // uunet!nuchat!sugar!schaub | 3336 Richmond Ave. Suite 323 \X/ (713) 523-8422 | Houston, TX 77098
paul@vixie.UUCP (Paul Vixie Esq) (03/28/88)
In article <4118@cup.portal.com> R_Tim_Coslet@cup.portal.com writes:
##Can anyone construct a suitable translation of:
##
## struct obj_name {
## int obj_type;
## union {
## int id;
## char *path;
## } obj_id;
## };
#
#The direct equivalent of the above C in Modula-2 is....
#
# ObjName : RECORD
# ObjType : INTEGER;
# CASE : BOOLEAN OF
# TRUE : id : INTEGER |
# FALSE : path : POINTER TO CHAR |
# END
# END
#
#
#This should creat the same data structure (and actually has one less
#identifier: obj_id).
Sometimes you *want* that intervening obj_id. In C, it's harder (though
possible) to make a variant record where this intervening member needn't
be named in references to the variant fields; in M2, you can do it thus:
TYPE ObjName = RECORD (* note 1 *)
ObjType: INTEGER;
ObjId: RECORD
CASE BOOLEAN OF (* note 2 *)
TRUE: id: INTEGER|
FALSE: path: POINTER TO CHAR; (* note 3 *)
END
END
END;
Note 1: we are creating a type in the C example, not a variable.
Note 2: No ':' before the type as far as I know; [brackets] may be needed
(I don't recall), and the type could be enumerated if more than
two variants are needed -- BOOLEAN is convenient but not mandatory.
Note 3: POINTER TO CHAR is one way to represent strings, but sometimes arrays
are used. Sure would be great if open arrays were allowed in places
other than a formal argument on a procedure...
--
Paul A Vixie Esq
paul%vixie@uunet.uu.net
{uunet,ptsfa,hoptoad}!vixie!paul
San Francisco, (415) 647-7023
alan@pdn.UUCP (Alan Lovejoy) (03/28/88)
In article <8803242203.AA28027@ndsuvax.UUCP> Info-Modula2 Distribution List <INFO-M2%UCF1VM.bitnet@jade.berkeley.edu> writes: >Mark, I believe the closest you could come to simulating the C union >statement would be using a Variant record. I know this involves an >extra identifier, but I don't know any way around this. I am currently >trying to write a program that translates C procedures to Modula2 >procedures and that is how I chose to translate the C union type. Why does a variant record require an extra identifier? The only extra identifier I know of is OPTIONAL: TYPE TaggedVariant = RECORD CASE type: CARDINAL OF 0: foo: Foo; | 1: bar: Bar; END; END; UntaggedVariant = RECORD CASE CARDINAL OF 0: foo: Foo; | 1: bar: Bar; END; END; Variant records do not have to have a tag field. What other extra identifier could there be? Did you mean the identifier that specifies the type of the case labels (CARDINAL in the examples above)? If so, that identifier is a compile-time creature only--it has no existence at run time. --alan@pdn
alan@pdn.UUCP (Alan Lovejoy) (03/31/88)
In article <850@vixie.UUCP> paul@vixie.UUCP (Paul Vixie Esq) writes: >Sometimes you *want* that intervening obj_id. In C, it's harder (though >possible) to make a variant record where this intervening member needn't >be named in references to the variant fields; in M2, you can do it thus: > >TYPE ObjName = RECORD (* note 1 *) > ObjType: INTEGER; > ObjId: RECORD > CASE BOOLEAN OF (* note 2 *) > TRUE: id: INTEGER| > FALSE: path: POINTER TO CHAR; (* note 3 *) > END > END > END; > >Note 1: we are creating a type in the C example, not a variable. Who said otherwise? The Modula-2 examples I have seen in this discussion were all type definitions, weren't they? >Note 2: No ':' before the type as far as I know; [brackets] may be needed > (I don't recall), and the type could be enumerated if more than > two variants are needed -- BOOLEAN is convenient but not mandatory. You are both wrong and right: the original syntax for Modula-2 did not have a colon before the type of a tagless variant. Most compilers still support this syntax (usually as the only option). However, Wirth changed the syntax in the third edition of his book (PIM2e3) making the colon required. >Note 3: POINTER TO CHAR is one way to represent strings, but sometimes arrays > are used. Sure would be great if open arrays were allowed in places > other than a formal argument on a procedure... POINTER TO CHAR is a TERRIBLE way to represent strings (unless you hide this representation behind an opaque type). Why? 1) There is no guarantee that SIZE(aCharVariable) = SIZE(string[0]) (assuming the declarations: VAR aCharVariable: CHAR; string: ARRAY [0..n] OF CHAR). This is not just theoretical. My 68k M2 compiler uses two bytes for a character variable but one byte for each character in a string. This breaks the following code: VAR cp, end: POINTER TO CHAR; string: ARRAY [0..n] OF CHAR; ... cp := ADR(string); end := base + String.Length(string); WHILE ADDRESS(cp) < ADDRESS(base) DO Process(cp^); cp := ADDRESS(cp) + TSIZE(CHAR); END; Even if we replace TSIZE(CHAR) with Char.lengthInAString, we still run up against the problem that the compiler thinks cp^ is a reference to two bytes, not one. So it emits object code such as MOVE.W, ADD.W, CMP.W, etc, when it should be emitting MOVE.B, ADD.B, CMP.B, etc. Whether this results in erroneous behaviour depends on the byte sex of the CPU (and the byte sex assumed in the algorithm). On the 68k, this is even more serious BECAUSE WORD MEMORY ACCESSES MUST OCCUR ONLY FOR EVEN ADDRESSES. An odd effective address used with WORD or LONGWORD data results in a processor-generated ADDRESS ERROR. POINTER TO CHAR is not a portable way to represent strings. 2) When the programmer sees 'string: POINTER TO CHAR', there is vital information about this object which is completely missing: a) How big is the string? b) Has 'string' been properly initialized to point either to NIL or to some string? c) Does 'string' point to an object on the heap (memory from the string was allocated using NEW or ALLOCATE), or does it point to an object on the stack (string := ADR(aStackVariable)). You wouldn't want to call DISPOSE or DEALLOCATE on 'string' if it points to a stack variable. d) How many other pointer variables reference the same object? You don't want to DEALLOCATE 'string' if there are still active references to it. POINTER TO CHAR is not a safe way to represent strings. 3) Programmers normally expect to be able to reference the i'th character in a string using array-index syntax: string[i]. If string is POINTER TO CHAR, that's not possible. Better is 'VAR string: POINTER TO ARRAY [0..Char.maxArray] OF CHAR;'. 'Char' is a definition module containing useful system dependent parameters describing the properties of characters and arrays of characters. Char.maxArray is the highest zero-based index that the compiler will allow for an ARRAY OF CHAR. This permits access to the i'th element using traditional syntax: string^[i], yet still provides for pointer arithmetic and dynamic sizing. It also finesses the SIZE(CHAR) problem. Even better is: TYPE DynamicStringIndex = [0..Char.maxArray]; DynamicString = RECORD size: DynamicStringIndex; base: POINTER TO ARRAY DynamicStringIndex OF CHAR; END; Best is: DEFINITION MODULE DynamicString; EXPORT QUALIFIED STRING, Index, ...; (* PRIVATE is NOT exported *) TYPE Index = [0..Char.maxArray]; PRIVATE; STRING = RECORD size: Index; (* read only variable *) base: PRIVATE; END; 4) "Open arrays" that are not procedure parameters are possible but do not come cheaply. Assume the following declarations: VAR string10: ARRAY [0..9] OF CHAR; string80: ARRAY [0..79] OF CHAR; foo: Bar; dynamicString: ARRAY OF CHAR; i: CARDINAL; When the block in which these declaraction reside is entered, the statically size objects (everything but 'dynamicSring' can easily be allocated on the stack. But the size of 'dynamicString' is undefined, so it cannot be allocated. What can be allocated is a hidded variable which will point to 'dynamicString', and a hidded variable which will specifiy the size of 'dynamicString'. Somewhere in the block, a value may be assigned to dynamicString: dynamicString := string10; It would be nice if we could allocate the memory for dynamicString on the stack at this point. If the usage of dynamicString is as simple as this case is so far, we can. The problem is how to allocate memory on the stack for multiple open arrays whose size changes more than once during execution of the block (open array procedure parameters don't have this problem because their size is known at block entry and cannot change until block exit). When the size of an open array changes, the value returned by ADR(anOpenArray) probably will have to change as well. Alogirithms that are valid for static arrays will likely break if the static arrays are redefined to be dynamic open arrays. There is no general solution to this problem except to allocate memory on the heap and not the stack. So the only thing generic open arrays give us is the ability to write 'anOpenArray[index]' instead of writing 'aDynamicArrayAllocatedByTheProgrammer^[index]'. We could get the same effect by slightly changing the syntax of the language so that 'a[i]' is recognized as shorthand for 'a^[i]'. Oh yeah, the compiler automatically allocates and deallocates for us. Which completely hides from the programmer the fact that these arrays are heap objects. Which has both its good and bad points. It's simpler (for the compiler writer) not to open this can of worms. If you feel you really need this functionality, I suggest you try Smalltalk, LISP or APL. Personally, I'd like to see new syntax permitting variables to have their initialization and termination processing defined as part of their declaration. Example: VAR i: CARDINAL := 0; (* initialize i to zero *) a: POINTER TO ARRAY [0..n] OF CHAR := NEW('Hello, world.') (* initialize a to NEW('Hello, world.'); NEW should be a function which accepts the initial value of the allocated object as its optional argument *) := DISPOSE(a); (* on termination of the block, assign DISPOSE(a) to a; DISPOSE should also be a function *) x: REAL := 3.14159 (* initialize x to pi *) := circumference / (2.0 * radius); (* on block exit, set x to be the value of this expression *) circumference: REAL := 0.0; radius: REAL := 1.0; The block termination code would execute just before the expression following a RETURN statement is evaluated, or else just before executing a RETURN (if the block is not a function). Notice that this can help to guarentee that functions don't return dangling pointers. Another suggestion would be to change the dynamic of pointer syntax so that a reference to a pointer variable references its dynamic object instead of the address of its dynamic object: VAR p: POINTER TO FooBar; a: ADDRESS; .... p := aFooBar; (* old syntax: p^ := aFooBar *) a^ := ADR(p); (* old syntax: a := p *) a^ := p^; (* old syntax: a := p *) This makes it possible to abstract over an algorithm so that it is valid either for pointers or non-pointers. It's analogous to VAR and VALUE parameters for procedures which make it possible to abstract procedure calls with respect to arguments being passed as addresses or as values. --Alan@pdn