[net.lang.c] limitations of casts, pointer and function declarartions...

breuel@harvard.ARPA (Thomas M. Breuel) (10/28/84)

The following questions came up while I was implementing the kernel of
a PROLOG pseudo-code interpreter in 'C'. I think that these three
constructs, casting the lhs of an assignment, pointers to themselves,
and functions with return value of a pointer to their own type, are
very useful (say convenient), in particular for the implementation of
symbol manipulation languages, and I hope that there are some nice cpp
& ccom tricks around to make them more convenient, and I hope that
they will eventually be permitted by the compiler.
==============================================================================
The (4.1/4.2BSD) C-compiler does not accept statements of the
following form:

{
	int x;
	char *y;
/*### [cc] illegal lhs of assignment operator = %%%*/
	(char *)x = y;
}

I don't think that this error message is sensible, since statments like

*((long *) 100) = 100;

work.

Bug, feature; explanation, or excuse? (the solution here is, of
course, to write 'x = (int)y;', but can the compiler make this
transformation without ambiguity in general?).
==============================================================================
I would like to declare a pointer to a thing of its own kind, i.e.
something of the form:

typedef ref *ref;

The point is that if a pointer of this type is dereferenced, it should
have the same type again.  In addition, the type should be cast'able
to/from integer, and the size associated with it should be that of a
single pointer of its kind.  I.e. expressions of the following type
should be possible:

{
	ref a,b;
	long c;
	*a = b;
	a = *b;
	c = (int)a;
	a = (ref)c;
	a++;
}

My first attempt was:

typedef long base;	/* change this to int type with size of pointer */
typedef base *ref;
#define deref(thing) ((ref)(*thing))

This works fine if one does all dereferencing through 'deref',
except that assignments still don't work quite right, since
'deref(thing)=thing;' still does not work, and the rhs has to
be cast instead (i.e. '*thing = (base)thing').
==============================================================================
Along the same lines, I'd like to be able to define a function
returning a pointer to its own kind, i.e.

typedef fun (*fun)();

which is useful to implement continuations without having to resort
to machine language hacks.
==============================================================================

					Thomas M. Breuel
				      breuel@harvard.arpa
			     ...{genrad!wjh12,seismo}!harvard!breuel

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/29/84)

> 	int x;
> 	char *y;
> /*### [cc] illegal lhs of assignment operator = %%%*/
> 	(char *)x = y;

You need an lvalue, not an expression, on the LHS.

> typedef ref *ref;

All types must reduce to a basic type (void, char, short, int, long,
float, double, struct/union) plus some number of operators (*, (), []).

> The point is that if a pointer of this type is dereferenced, it should
> have the same type again.  In addition, the type should be cast'able
> to/from integer, and the size associated with it should be that of a
> single pointer of its kind.

At present, the generic pointer type is (char *) (ANSI will probably
change this to (void *)).  A (char *) is castable to a (long) and back
without loss of information (note: NOT always to an (int)).  By using
appropriate type casts you can use the contents of a (char *) or (long)
for other purposes.

> Along the same lines, I'd like to be able to define a function
> returning a pointer to its own kind, i.e.
> 
> typedef fun (*fun)();

Ditto.

C is a typed language.  To implement another, untyped, language (or one
with types that do not reduce to basic types) in C you must pick a
definite C data type to represent objects in the other language (in this
case, ProLog).  Then you need to coerce data into the appropriate types
via typecasts when implementing recursive types etc.  C lets you do this
but you have to explicitly indicate that you are playing tricks with data
types.  (With some C implementations you can be pretty sloppy, but for
portable code follow the type rules and run everything past "lint".)

breuel@harvard.ARPA (Thomas M. Breuel) (10/29/84)

In reply to Doug Gwyn's comments:

>> 	int x;
>> 	char *y;
>> /*### [cc] illegal lhs of assignment operator = %%%*/
>> 	(char *)x = y;
>
>You need an lvalue, not an expression, on the LHS.

K&R 'defines' an lvalue as an 'expression referring to an object'.
and an object as a 'manipulable region of storage'. '(char *)x' is a
perfectly good lvalue: it refers to the object 'x' considered as
a pointer to a character.

>> typedef ref *ref;
>
>All types must reduce to a basic type (void, char, short, int, long,
>float, double, struct/union) plus some number of operators (*, (), []).

I think the semantics of this typedef are clear (and can be defined
easily). Also, the above typedef is in no way different from the
structure declaration 'struct foo {foo *ref;};'. FTSO consistency and
considering its usefulness, I think the above typedef should be
permitted. (BTW, a struct/union is not a basic type either).

>> The point is that if a pointer of this type is dereferenced, it should
>> have the same type again.  In addition, the type should be cast'able
>> to/from integer, and the size associated with it should be that of a
>> single pointer of its kind.
>
>At present, the generic pointer type is (char *) (ANSI will probably
>change this to (void *)).  A (char *) is castable to a (long) and back
>without loss of information (note: NOT always to an (int)).  By using
>appropriate type casts you can use the contents of a (char *) or (long)
>for other purposes.

Not quite: let p be defined as 'char *p;'. '((long *)p)++' does not
work (see above). As I said, I don't want a 'generic' or 'untyped'
pointer, but a pointer to something which has the same length and type
as the pointer itself, so that I can use dereferencing and additive
assignment operators on it freely. (BTW, I don't think that an untyped
pointer is very useful: you might as well use a pointer to a character
if you don't care about the size associated with some pointer).

>C is a typed language.  To implement another, untyped, language (or one
>with types that do not reduce to basic types) in C you must pick a
>definite C data type to represent objects in the other language (in this
>case, ProLog).  Then you need to coerce data into the appropriate types
>via typecasts when implementing recursive types etc.  C lets you do this
>but you have to explicitly indicate that you are playing tricks with data
>types.  (With some C implementations you can be pretty sloppy, but for
>portable code follow the type rules and run everything past "lint".)

No, recursive types are legal an possible in 'C'. Take the above
example 'struct foo {struct foo *ref;};'. Unfortunately, this does not
help to implement recursive pointers easily, since this structure
cannot be cast directly to/from an integer type, and since it is even
more inconvenient to include the suffix '.ref' everywhere than it is to
cast explicitely everywhere.

Also, Pascal, which is doubtlessly a strongly typed language, does
permit a type definition like 'type ref= ^ref;' and handles it
correctly.  Strong typing is not contradicted by allowing a recursive
pointer definition.  (BTW, I don't think that one can be dogmatic about
a typing system as messy as that of 'C').

Again, my question is: given the 4.2 cpp and ccom, what is the most
convenient way (i.e. requiring the least number of typecasts) of
dealing with the case the 99% of all pointers give another pointer of
their own kind upon dereferencing?

Sorry for not expressing myself more clearly in the first posting.

						Thomas M. Breuel
						breuel@harvard.arpa

mike@hcradm.UUCP (Mike Tilson) (10/29/84)

It was noted that C compilers do not accept the following:

	int x;
	char *y;

	(char *)x = y;

I assume that what is wanted is to treat storage location "x" as a cell
that holds a "char *".  Declaring "x" to be a union type is the portable
way to do this.  If one does not wish to be portable, C already allows
the type cast you want, but it's a bit more complicated:

	*( (char **) &x) = y;

I checked this on the Vax System V.2 compiler.  It likes it just fine, and
generates the obvious single instruction move.  Repeat: it isn't portable.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/30/84)

> K&R 'defines' an lvalue as an 'expression referring to an object'.
> and an object as a 'manipulable region of storage'. '(char *)x' is a
> perfectly good lvalue: it refers to the object 'x' considered as
> a pointer to a character.

K&R is not at all clear about lvalues.  That shows why we need a good
standards document.

> >> typedef ref *ref;
> 
> I think the semantics of this typedef are clear (and can be defined
> easily). Also, the above typedef is in no way different from the
> structure declaration 'struct foo {foo *ref;};'. FTSO consistency and
> considering its usefulness, I think the above typedef should be
> permitted. (BTW, a struct/union is not a basic type either).

Ok, so I should've said "type-specifier".  There is a big difference
between your typedef, which has indeterminate content (e.g. the compiler
could consistently allocate no storage at all for the type!) and a struct/
union that can contain a pointer to an instance of itself (whether it can
contain a pointer to a not-yet-defined type is a debatable point).  There
is an explicit KLUDGE made in the language rules to permit such structs,
due to their great usefulness in implementing data structures.

> Again, my question is: given the 4.2 cpp and ccom, what is the most
> convenient way (i.e. requiring the least number of typecasts) of
> dealing with the case the 99% of all pointers give another pointer of
> their own kind upon dereferencing?

Your example could be achieved within the rules by using
	typedef struct ref { struct ref *ref; } ref;
since now it is clear how much storage to allocate for one of these beasts.
It is then trivial to write macros to dereference a "ref" etc.:
	#define nextref( refp )	(refp)->ref	/* for pointers */
	#define deref( ref )	(*ref.ref)	/* for values */
The first of these returns an lvalue, the second an rvalue.

The point is, don't fight the language but rather, learn to exploit it.

kpmartin@watmath.UUCP (Kevin Martin) (10/30/84)

The correct method of doing a pointer to its own type or to something else
is:
	union x {
		union x *next;
		other_type pot_of_gold;
	};

Usins casts as you suggested is a sure way to get burned when you try
porting.

As for you comments about casting Lvalues, note that, even as Rvalues,
the expressions
	(int) x
and
	*(int *) &x
DO NOT DO THE SAME THING. The former takes 'x' and does a meaningful
(usually) conversion to int. The latter merely grabs the first 16 (or 32
or 36) bits of 'x' (regardless of its true size), and interprets the bit
pattern it found as an int.

Given the possibility of doing
	p = (char *)i;
I don't see any need for
	(char *)p = i;
no matter what interpretation you give to it. I find this statement it
no more sensical than, say,
	-x = y;
(after all, both the type cast and negation are valid unary operators)
Your argument about consistency with
	*(char *)&p = i;
is fallacious. The latter expression is indeed assigning to an Lvalue
(the result of the indirection operator), and is quite consistent with
other legal (and illegal) uses of type casting and the rules for L-
and R-values.
                      Kevin Martin, UofW Software Development Group.

guido@mcvax.UUCP (Guido van Rossum) (10/30/84)

In article <120@harvard.ARPA> breuel@harvard.ARPA (Thomas M. Breuel) writes:
>	int x;
>	char *y;
>/*### [cc] illegal lhs of assignment operator = %%%*/
>	(char *)x = y;

Sorry, you're thinking Algol-68.  What you need is:
	*( (char*) &x ) = y;

Let me try to explain why.  The difference between lvalues and rvalues
in C is quite different from the difference between 'int' and 'ref' 'int'
in Algol-68.  Here are the rules:

The following are declared to be lvalues:
- names of variables (except arrays, see elsewhere in net.lang.c :-);
- the expression *something, where 'something' may be any expression;
- e1[e2]; this can be deduced from the previous rule because, by definition,
  e1[e2] means *((e1)+(e2)) /* remember special semantics of pointer+int */;
- an lvalue between parentheses is still an lvalue.
Everything else is an rvalue.  (Summary: lvalues have addresses;
rvalues don't.  But lvalues *are* not addresses.  They're variables.)

In expressions (note that assignments are also expressions), there is a need
for lvalues and rvalues.  An lvalue is needed:
- at the left-hand side of an assignment operator (hence the name);
- as an argument to the auto-increment/decrement operators ++ and --
  (note that the RESULT of these is only an rvalue!);
- as an argument to the address-of operator, &something.
Everywhere else, an rvalue is needed.
When an lvalue is found where an rvalue is needed, it is turned into
an rvalue by using the value contained in its address.

Again, summarizing: after int x;, x and 1 have exactly the same type;
only x is an lvalue and 1 is an rvalue.


>(the solution here is, of
>course, to write 'x = (int)y;', but can the compiler make this
>transformation without ambiguity in general?).

Huh?  This assigns to all (sizeof int) bytes of x, while
	* ( (char*) &x ) = y;
assigns only to x's first (or last) byte.  What did you want?


>typedef ref *ref;

Looks very much like Algol-68 again (except that there, you can never
use the thing at all, because there are no unrestricted casts as in C...).
The fact is, and this will not change, that a typedef cannot contain
references to itself (the typedef-ed name becomes defined only *after*
the typedef has been processed by the compiler).  The only way to build
recursive types is using structure pointers, as it *is* allowed to write
	struct foo *x; /* but not struct foo x; !!! */
when struct foo is not yet declared (see response by Doug Gwyn).

>typedef long base;	/* change this to int type with size of pointer */
>typedef base *ref;
>#define deref(thing) ((ref)(*thing))

How about this:

	#define deref(thing) (* (ref*)(thing))

Because the '*' operator is at the outermost level, this macro
expands to an lvalue, with the same type as your macro, and can thus
be used in an assignment.


>typedef fun (*fun)();

Same remarks: typedefs can't be recursive.  Sorry, that's the way it is.

--
	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam
	guido@mcvax.UUCP

"Don't stop.  Go right on complaining.  It's *so* beautiful!"

steveg@hammer.UUCP (Steve Glaser) (11/02/84)

In article <mcvax.6126> guido@mcvax.UUCP (Guido van Rossum) writes:
>In article <120@harvard.ARPA> breuel@harvard.ARPA (Thomas M. Breuel) writes:
>>	int x;
>>	char *y;
>>/*### [cc] illegal lhs of assignment operator = %%%*/
>>	(char *)x = y;
>
>Sorry, you're thinking Algol-68.  What you need is:
>	*( (char*) &x ) = y;
>
Mostly right except that what I think Tom wanted was more like:
	*( (char**) &x ) = y;
The same arguments you made still hold.

Note that this is highly unportable code since it assumes that the bit
layout in memory is compatable between ints and char *.  There is
nothing in the language that requires this.  The only requirement in C
is that converting a pointer to a suitably long int and back must not
loose information.

From what I've seen on net.lang.c, I think the Prime compiler may have
this problem (48 bit pointers, 32 bit ints, when converting a pointer
to an int the ring number portion of the pointer gets droped, when
going the other way the ring number gets set to "user ring", works fine
until you try to use it in non-user-ring code).

Thus, the most portable way to write this expression is still:

	x = (int)y;

	Steve Glaser
	tektronix!steveg

geoff@desint.UUCP (Geoff Kuenning) (11/03/84)

In article <6126@mcvax.UUCP> guido@mcvax.UUCP (Guido van Rossum) writes:

>>	int x;
>>	char *y;
>>/*### [cc] illegal lhs of assignment operator = %%%*/
>>	(char *)x = y;
>
>Sorry, you're thinking Algol-68.  What you need is:
>	*( (char*) &x ) = y;
>

I don't think that's the code that was intended.  The guy wanted to do
the same as

	x = (int) y;

but wanted to put the typecast on the left instead of the right for
readability reasons.  Guido's code gives us the equivalent of this Vax code:

	x &= ~0xFF;
	x |= (int) y & 0xFF;

and other code on other machines, depending on word size and byte ordering.

-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff