breuel@harvard.ARPA (Thomas M. Breuel) (10/28/84)
The following questions came up while I was implementing the kernel of a PROLOG pseudo-code interpreter in 'C'. I think that these three constructs, casting the lhs of an assignment, pointers to themselves, and functions with return value of a pointer to their own type, are very useful (say convenient), in particular for the implementation of symbol manipulation languages, and I hope that there are some nice cpp & ccom tricks around to make them more convenient, and I hope that they will eventually be permitted by the compiler. ============================================================================== The (4.1/4.2BSD) C-compiler does not accept statements of the following form: { int x; char *y; /*### [cc] illegal lhs of assignment operator = %%%*/ (char *)x = y; } I don't think that this error message is sensible, since statments like *((long *) 100) = 100; work. Bug, feature; explanation, or excuse? (the solution here is, of course, to write 'x = (int)y;', but can the compiler make this transformation without ambiguity in general?). ============================================================================== I would like to declare a pointer to a thing of its own kind, i.e. something of the form: typedef ref *ref; The point is that if a pointer of this type is dereferenced, it should have the same type again. In addition, the type should be cast'able to/from integer, and the size associated with it should be that of a single pointer of its kind. I.e. expressions of the following type should be possible: { ref a,b; long c; *a = b; a = *b; c = (int)a; a = (ref)c; a++; } My first attempt was: typedef long base; /* change this to int type with size of pointer */ typedef base *ref; #define deref(thing) ((ref)(*thing)) This works fine if one does all dereferencing through 'deref', except that assignments still don't work quite right, since 'deref(thing)=thing;' still does not work, and the rhs has to be cast instead (i.e. '*thing = (base)thing'). ============================================================================== Along the same lines, I'd like to be able to define a function returning a pointer to its own kind, i.e. typedef fun (*fun)(); which is useful to implement continuations without having to resort to machine language hacks. ============================================================================== Thomas M. Breuel breuel@harvard.arpa ...{genrad!wjh12,seismo}!harvard!breuel
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/29/84)
> int x; > char *y; > /*### [cc] illegal lhs of assignment operator = %%%*/ > (char *)x = y; You need an lvalue, not an expression, on the LHS. > typedef ref *ref; All types must reduce to a basic type (void, char, short, int, long, float, double, struct/union) plus some number of operators (*, (), []). > The point is that if a pointer of this type is dereferenced, it should > have the same type again. In addition, the type should be cast'able > to/from integer, and the size associated with it should be that of a > single pointer of its kind. At present, the generic pointer type is (char *) (ANSI will probably change this to (void *)). A (char *) is castable to a (long) and back without loss of information (note: NOT always to an (int)). By using appropriate type casts you can use the contents of a (char *) or (long) for other purposes. > Along the same lines, I'd like to be able to define a function > returning a pointer to its own kind, i.e. > > typedef fun (*fun)(); Ditto. C is a typed language. To implement another, untyped, language (or one with types that do not reduce to basic types) in C you must pick a definite C data type to represent objects in the other language (in this case, ProLog). Then you need to coerce data into the appropriate types via typecasts when implementing recursive types etc. C lets you do this but you have to explicitly indicate that you are playing tricks with data types. (With some C implementations you can be pretty sloppy, but for portable code follow the type rules and run everything past "lint".)
breuel@harvard.ARPA (Thomas M. Breuel) (10/29/84)
In reply to Doug Gwyn's comments: >> int x; >> char *y; >> /*### [cc] illegal lhs of assignment operator = %%%*/ >> (char *)x = y; > >You need an lvalue, not an expression, on the LHS. K&R 'defines' an lvalue as an 'expression referring to an object'. and an object as a 'manipulable region of storage'. '(char *)x' is a perfectly good lvalue: it refers to the object 'x' considered as a pointer to a character. >> typedef ref *ref; > >All types must reduce to a basic type (void, char, short, int, long, >float, double, struct/union) plus some number of operators (*, (), []). I think the semantics of this typedef are clear (and can be defined easily). Also, the above typedef is in no way different from the structure declaration 'struct foo {foo *ref;};'. FTSO consistency and considering its usefulness, I think the above typedef should be permitted. (BTW, a struct/union is not a basic type either). >> The point is that if a pointer of this type is dereferenced, it should >> have the same type again. In addition, the type should be cast'able >> to/from integer, and the size associated with it should be that of a >> single pointer of its kind. > >At present, the generic pointer type is (char *) (ANSI will probably >change this to (void *)). A (char *) is castable to a (long) and back >without loss of information (note: NOT always to an (int)). By using >appropriate type casts you can use the contents of a (char *) or (long) >for other purposes. Not quite: let p be defined as 'char *p;'. '((long *)p)++' does not work (see above). As I said, I don't want a 'generic' or 'untyped' pointer, but a pointer to something which has the same length and type as the pointer itself, so that I can use dereferencing and additive assignment operators on it freely. (BTW, I don't think that an untyped pointer is very useful: you might as well use a pointer to a character if you don't care about the size associated with some pointer). >C is a typed language. To implement another, untyped, language (or one >with types that do not reduce to basic types) in C you must pick a >definite C data type to represent objects in the other language (in this >case, ProLog). Then you need to coerce data into the appropriate types >via typecasts when implementing recursive types etc. C lets you do this >but you have to explicitly indicate that you are playing tricks with data >types. (With some C implementations you can be pretty sloppy, but for >portable code follow the type rules and run everything past "lint".) No, recursive types are legal an possible in 'C'. Take the above example 'struct foo {struct foo *ref;};'. Unfortunately, this does not help to implement recursive pointers easily, since this structure cannot be cast directly to/from an integer type, and since it is even more inconvenient to include the suffix '.ref' everywhere than it is to cast explicitely everywhere. Also, Pascal, which is doubtlessly a strongly typed language, does permit a type definition like 'type ref= ^ref;' and handles it correctly. Strong typing is not contradicted by allowing a recursive pointer definition. (BTW, I don't think that one can be dogmatic about a typing system as messy as that of 'C'). Again, my question is: given the 4.2 cpp and ccom, what is the most convenient way (i.e. requiring the least number of typecasts) of dealing with the case the 99% of all pointers give another pointer of their own kind upon dereferencing? Sorry for not expressing myself more clearly in the first posting. Thomas M. Breuel breuel@harvard.arpa
mike@hcradm.UUCP (Mike Tilson) (10/29/84)
It was noted that C compilers do not accept the following: int x; char *y; (char *)x = y; I assume that what is wanted is to treat storage location "x" as a cell that holds a "char *". Declaring "x" to be a union type is the portable way to do this. If one does not wish to be portable, C already allows the type cast you want, but it's a bit more complicated: *( (char **) &x) = y; I checked this on the Vax System V.2 compiler. It likes it just fine, and generates the obvious single instruction move. Repeat: it isn't portable.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/30/84)
> K&R 'defines' an lvalue as an 'expression referring to an object'. > and an object as a 'manipulable region of storage'. '(char *)x' is a > perfectly good lvalue: it refers to the object 'x' considered as > a pointer to a character. K&R is not at all clear about lvalues. That shows why we need a good standards document. > >> typedef ref *ref; > > I think the semantics of this typedef are clear (and can be defined > easily). Also, the above typedef is in no way different from the > structure declaration 'struct foo {foo *ref;};'. FTSO consistency and > considering its usefulness, I think the above typedef should be > permitted. (BTW, a struct/union is not a basic type either). Ok, so I should've said "type-specifier". There is a big difference between your typedef, which has indeterminate content (e.g. the compiler could consistently allocate no storage at all for the type!) and a struct/ union that can contain a pointer to an instance of itself (whether it can contain a pointer to a not-yet-defined type is a debatable point). There is an explicit KLUDGE made in the language rules to permit such structs, due to their great usefulness in implementing data structures. > Again, my question is: given the 4.2 cpp and ccom, what is the most > convenient way (i.e. requiring the least number of typecasts) of > dealing with the case the 99% of all pointers give another pointer of > their own kind upon dereferencing? Your example could be achieved within the rules by using typedef struct ref { struct ref *ref; } ref; since now it is clear how much storage to allocate for one of these beasts. It is then trivial to write macros to dereference a "ref" etc.: #define nextref( refp ) (refp)->ref /* for pointers */ #define deref( ref ) (*ref.ref) /* for values */ The first of these returns an lvalue, the second an rvalue. The point is, don't fight the language but rather, learn to exploit it.
kpmartin@watmath.UUCP (Kevin Martin) (10/30/84)
The correct method of doing a pointer to its own type or to something else
is:
union x {
union x *next;
other_type pot_of_gold;
};
Usins casts as you suggested is a sure way to get burned when you try
porting.
As for you comments about casting Lvalues, note that, even as Rvalues,
the expressions
(int) x
and
*(int *) &x
DO NOT DO THE SAME THING. The former takes 'x' and does a meaningful
(usually) conversion to int. The latter merely grabs the first 16 (or 32
or 36) bits of 'x' (regardless of its true size), and interprets the bit
pattern it found as an int.
Given the possibility of doing
p = (char *)i;
I don't see any need for
(char *)p = i;
no matter what interpretation you give to it. I find this statement it
no more sensical than, say,
-x = y;
(after all, both the type cast and negation are valid unary operators)
Your argument about consistency with
*(char *)&p = i;
is fallacious. The latter expression is indeed assigning to an Lvalue
(the result of the indirection operator), and is quite consistent with
other legal (and illegal) uses of type casting and the rules for L-
and R-values.
Kevin Martin, UofW Software Development Group.
guido@mcvax.UUCP (Guido van Rossum) (10/30/84)
In article <120@harvard.ARPA> breuel@harvard.ARPA (Thomas M. Breuel) writes: > int x; > char *y; >/*### [cc] illegal lhs of assignment operator = %%%*/ > (char *)x = y; Sorry, you're thinking Algol-68. What you need is: *( (char*) &x ) = y; Let me try to explain why. The difference between lvalues and rvalues in C is quite different from the difference between 'int' and 'ref' 'int' in Algol-68. Here are the rules: The following are declared to be lvalues: - names of variables (except arrays, see elsewhere in net.lang.c :-); - the expression *something, where 'something' may be any expression; - e1[e2]; this can be deduced from the previous rule because, by definition, e1[e2] means *((e1)+(e2)) /* remember special semantics of pointer+int */; - an lvalue between parentheses is still an lvalue. Everything else is an rvalue. (Summary: lvalues have addresses; rvalues don't. But lvalues *are* not addresses. They're variables.) In expressions (note that assignments are also expressions), there is a need for lvalues and rvalues. An lvalue is needed: - at the left-hand side of an assignment operator (hence the name); - as an argument to the auto-increment/decrement operators ++ and -- (note that the RESULT of these is only an rvalue!); - as an argument to the address-of operator, &something. Everywhere else, an rvalue is needed. When an lvalue is found where an rvalue is needed, it is turned into an rvalue by using the value contained in its address. Again, summarizing: after int x;, x and 1 have exactly the same type; only x is an lvalue and 1 is an rvalue. >(the solution here is, of >course, to write 'x = (int)y;', but can the compiler make this >transformation without ambiguity in general?). Huh? This assigns to all (sizeof int) bytes of x, while * ( (char*) &x ) = y; assigns only to x's first (or last) byte. What did you want? >typedef ref *ref; Looks very much like Algol-68 again (except that there, you can never use the thing at all, because there are no unrestricted casts as in C...). The fact is, and this will not change, that a typedef cannot contain references to itself (the typedef-ed name becomes defined only *after* the typedef has been processed by the compiler). The only way to build recursive types is using structure pointers, as it *is* allowed to write struct foo *x; /* but not struct foo x; !!! */ when struct foo is not yet declared (see response by Doug Gwyn). >typedef long base; /* change this to int type with size of pointer */ >typedef base *ref; >#define deref(thing) ((ref)(*thing)) How about this: #define deref(thing) (* (ref*)(thing)) Because the '*' operator is at the outermost level, this macro expands to an lvalue, with the same type as your macro, and can thus be used in an assignment. >typedef fun (*fun)(); Same remarks: typedefs can't be recursive. Sorry, that's the way it is. -- Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam guido@mcvax.UUCP "Don't stop. Go right on complaining. It's *so* beautiful!"
steveg@hammer.UUCP (Steve Glaser) (11/02/84)
In article <mcvax.6126> guido@mcvax.UUCP (Guido van Rossum) writes: >In article <120@harvard.ARPA> breuel@harvard.ARPA (Thomas M. Breuel) writes: >> int x; >> char *y; >>/*### [cc] illegal lhs of assignment operator = %%%*/ >> (char *)x = y; > >Sorry, you're thinking Algol-68. What you need is: > *( (char*) &x ) = y; > Mostly right except that what I think Tom wanted was more like: *( (char**) &x ) = y; The same arguments you made still hold. Note that this is highly unportable code since it assumes that the bit layout in memory is compatable between ints and char *. There is nothing in the language that requires this. The only requirement in C is that converting a pointer to a suitably long int and back must not loose information. From what I've seen on net.lang.c, I think the Prime compiler may have this problem (48 bit pointers, 32 bit ints, when converting a pointer to an int the ring number portion of the pointer gets droped, when going the other way the ring number gets set to "user ring", works fine until you try to use it in non-user-ring code). Thus, the most portable way to write this expression is still: x = (int)y; Steve Glaser tektronix!steveg
geoff@desint.UUCP (Geoff Kuenning) (11/03/84)
In article <6126@mcvax.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes: >> int x; >> char *y; >>/*### [cc] illegal lhs of assignment operator = %%%*/ >> (char *)x = y; > >Sorry, you're thinking Algol-68. What you need is: > *( (char*) &x ) = y; > I don't think that's the code that was intended. The guy wanted to do the same as x = (int) y; but wanted to put the typecast on the left instead of the right for readability reasons. Guido's code gives us the equivalent of this Vax code: x &= ~0xFF; x |= (int) y & 0xFF; and other code on other machines, depending on word size and byte ordering. -- Geoff Kuenning First Systems Corporation ...!ihnp4!trwrb!desint!geoff