[comp.lang.c] A quick question...

eychaner@suncub.bbso.caltech.edu (Amateurgrammer) (03/12/91)

Just a quick question...I personally still don't quite understand what is
and is not legal on the left side of an assignment.
Is this legal...
	unsigned char *pointer1;
	short short_value;
	...
	*((short *) pointer1) = short_value;
	...
And does it do what I think it does, that is, assign short_value to the
storage pointed to by pointer1?  I hope you understand what I mean...
								-Glenn
******************************************************************************
Glenn Eychaner - Big Bear Solar Observatory - eychaner@suncub.bbso.caltech.edu
"...this morning's unprecedented solar eclipse is no cause for alarm."
                                                               -_Flash Gordon_

eychaner@suncub.bbso.caltech.edu (Amateurgrammer) (03/12/91)

eychaner@suncub.bbso.caltech.edu (Amateurgrammer) writes:
*Note* A followup question follows....
>Is this legal...
>	unsigned char *pointer1;
>	short short_value;
>	...
>	*((short *) pointer1) = short_value;
>	...
>And does it do what I think it does, that is, assign short_value to the
>storage pointed to by pointer1?  I hope you understand what I mean...
>								-Glenn
Summary so far:  In short, yes it is legal.  However:
1) pointer1 must point to something large enough to hold a short.  Duh. :-)
2) Unless pointer1 is something like:
	pointer1 = (unsigned char *) (&something_short);
   alignment problems may result on some machines.  I figured that.
3) The exact effect is not machine portable (depending on your machine's
   order of storage for short, I guess).
Followup question:
The reason for this is I am writing a subroutine which stores some values
in an array of short OR unsigned char.  I thought the solution would be:
(this is a HIGHLY simplified version; the real routine is complex enough
that I don't need two nearly identical copies floating about...)

int do_array (void *array, int size, int type)
{
unsigned char *arrayptr;
int i;

arrayptr = array;   /* If only void * arithmetic were legal in VAX C */
for (i = 0; i < size; i++) {
    if (type == UNSIGNED_CHAR) {
        /* My precedence rules say this is right */
        *arrayptr++ = some_value();  /* Declared properly elsewhere */
        }
    else if (type == SHORT) {
        /* My precedence rules ALSO say THIS is right */
        *(short *)arrayptr++ = othervalue();  /* Declared elsewhere */
        }
    }
}
******************************************************************************
Glenn Eychaner - Big Bear Solar Observatory - eychaner@suncub.bbso.caltech.edu
"...this morning's unprecedented solar eclipse is no cause for alarm."
                                                               -_Flash Gordon_

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/12/91)

In article <1991Mar12.030759.26698@nntp-server.caltech.edu> eychaner@suncub.bbso.caltech.edu writes:
> 	unsigned char *pointer1;
> 	short short_value;

The value that ``pointer1'' refers to has not yet been defined. It's
probably garbage.

> 	...
> 	*((short *) pointer1) = short_value;

If ... doesn't include any initialization of pointer1, this has
undefined behavior.

If ... initializes pointer1 to point to some memory location other than
that occupied by a short, your code sequence has undefined behavior.

If ... initializes pointer1 to point to the memory occupied by a short,
then this will work. You can cast freely between char *, void *, and any
(single) other pointer-to-object type. This would work:

  char *pointer1;
  short foo;
  short bar;
  ...
  pointer1 = (char *) &foo;
  *((short *) pointer1) = bar;

This has the same effect as foo = bar. I believe unsigned char * works
the same as char * for this purpose.

> And does it do what I think it does, that is, assign short_value to the
> storage pointed to by pointer1?

Only if you've set pointer1 to point to some valid short storage. (Of
course, you need to initialize short_value as well.)

---Dan

sarima@tdatirv.UUCP (Stanley Friesen) (03/13/91)

In article <1991Mar12.030759.26698@nntp-server.caltech.edu> eychaner@suncub.bbso.caltech.edu writes:
>Just a quick question...I personally still don't quite understand what is
>and is not legal on the left side of an assignment.
>Is this legal...
>	unsigned char *pointer1;
>	short short_value;
>	...
>	*((short *) pointer1) = short_value;
>	...
>And does it do what I think it does, that is, assign short_value to the
>storage pointed to by pointer1?  I hope you understand what I mean...

Assuming that you have initialized pointer1 to point to a suitably alligned
block of memory at least large enough to hold one short, then yes.

If the pointer1 is uninitialized, or points to an improperly alligned block,
or points to a block that is too small then the result is undefined.

To put it another way:
Dereferencing a pointer always yields an lvalue if the pointer is valid.
If the pointer is invalid, the dereference yields undefined results.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

bls@u02.svl.cdc.com (Brian Scearce) (03/13/91)

eychaner@suncub.bbso.caltech.edu (Amateurgrammer) writes:
> Just a quick question...I personally still don't quite understand
> what is and is not legal on the left side of an assignment.

The rules are pretty easy (especially if you have your copy of
Harbison and Steele on your desk :-)

0. variable names (excepting function, array and enum constant
   names) are lvalues.
1. e[k] is an lvalue, regardless of whether e and k are lvalues.
2. (e) is an lvalue iff e is.
3. e.name is an lvalue iff e is.
4. e->name is an lvalue regardless of whether e is an lvalue.
5. *e is an lvalue regardless of whether e is an lvalue.

No other form of expression can form an lvalue.

And that's it!

Your question of what can be on the left hand side of an assignment
is slightly complicated by the ANSI separation of lvalue into
modifiable lvalue and non-modifiable lvalue, and I don't feel
qualified to post about that.  But for K&R C (and maybe ANSI C in
the absence of "const"?), the 6 rules above are all you need to
know.

--
     Brian Scearce (bls@robin.svl.cdc.com  -or-  robin!bls@shamash.cdc.com)
    "Don't be surprised when a crack in the ice appears under your feet..."
 Any opinions expressed herein do not necessarily reflect CDC corporate policy.

cc100aa@prism.gatech.EDU (Ray Spalding) (03/14/91)

In article <1991Mar12.062941.2369@nntp-server.caltech.edu> eychaner@suncub.bbso.caltech.edu writes:
>[...]
>unsigned char *arrayptr;
>[...]
>    else if (type == SHORT) {
>        *(short *)arrayptr++ = othervalue();  /* Declared elsewhere */

I think what you want here is:
	*(short *)arrayptr = othervalue();
	arrayptr += sizeof(short);
or:
	*((short *)arrayptr)++ = othervalue();
-- 
Ray Spalding, Technical Services, Office of Information Technology
Georgia Institute of Technology, Atlanta Georgia, 30332-0715
uucp:     ...!{allegra,amd,hplabs,ut-ngp}!gatech!prism!cc100aa
Internet: cc100aa@prism.gatech.edu

eychaner@suncub.bbso.caltech.edu (Amateurgrammer) (03/14/91)

cc100aa@prism.gatech.EDU (Ray Spalding) writes:
>eychaner@suncub.bbso.caltech.edu writes:
>>[...]
>>unsigned char *arrayptr;
>>[...]
>>    else if (type == SHORT) {
>>        *(short *)arrayptr++ = othervalue();  /* Declared elsewhere */
>
>I think what you want here is:
>	*((short *)arrayptr)++ = othervalue();
Yup.  This one is what I meant.  Accidentally lost a set of parentheses
when I copied this in...oops...and no one else noticed!
******************************************************************************
Glenn Eychaner - Big Bear Solar Observatory - eychaner@suncub.bbso.caltech.edu
"...this morning's unprecedented solar eclipse is no cause for alarm."
                                                               -_Flash Gordon_

torek@elf.ee.lbl.gov (Chris Torek) (03/15/91)

In article <1991Mar13.174154.12537@nntp-server.caltech.edu>
eychaner@suncub.bbso.caltech.edu writes:
>>	*((short *)arrayptr)++ = othervalue();
>... is what I meant.

But this does not mean anything.  A cast is defined semantically as
`assign the cast-expression value to an unnamed temporary variable
whose type is given by the cast'.  Thus, aside from the fact that a
cast produces a value, not an object, and therefore cannot be
incremented, this expression is otherwise semantically identical to:

	{ short *temp; temp = arrayptr; *temp++ = othervalue(); }

In other words, if the increment were legal, it would not alter arrayptr
at all, but rather some mysterious temporary variable.  Fortunately,
the increment is illegal.

As someone else pointed out earlier (but it bears repeating [either that
or you might as well give up on comp.lang.c :-) ]), the expression

	void f(char *arrayptr) {
		*(*(short **)&arrayptr)++ = 1;
		*(*(short **)&arrayptr)++ = 2;
		*(*(short **)&arrayptr)++ = 3;
	}

*is* legal, but is probably not what you meant anyway.  Disassembling
one of the above expressions into its components reveals why:

	arrayptr:
		<object, pointer to char>

	&arrayptr:
		<value, pointer to pointer to char>

	(short **)&arrayptr:
		short **temp; temp = (address of arrayptr, treated
			as if it were a pointer to pointer to short);
		<value, pointer to pointer to short, temp>

	*(short **)&arrayptr:
		<object, short, *temp>	[note 1]

	(*(short **)&arrayptr)++:
		<value, short, *temp>	[note 1] and also
		add 1 to *temp before next sequence point

	*(*(short **)&arrayptr)++
		<object, short, **temp>	[note 1] and also
		add 1 to *temp before next sequence point

To figure out what this mess meant, I shortened the last three
<>-bracketed triples to just use `*temp' and `**temp', without first
writing down what `*temp' is, so now we need to do that:

	[note 1] *temp is:
		The object found at the address given by `temp'.
		Temp is a <value, pointer to pointer to short,
			(address of <object, pointer to char, &arrayptr>
			treated as if it were a pointer to pointer to short)>.
		But what *is* this value?  The answer is:  `We have no
		idea and we cannot find out without going to the
		compiler, or the compiler's documentation or author or
		whatever, and finding out what it does on this
		particular machine.'

We do not know, and cannot find out (without going into the guts of
the compiler), what you get when you treat an <object, pointer to char>
as if it were something else.

Just for fun, though, we can go ahead and dig into the guts of a compiler.
I will take a typical C compiler for a Data General MV series machine.

The Data General MV series has two kinds of pointers, `byte pointers'
and `word pointers'.  Both are 32 bits long, but one looks vaguely like
this:

	WWW...WWWI	[note: I am deliberately leaving out the ring stuff]
			(mostly because I cannot remember how it worked)

and the other like this:

	BWWW...WWW

where W is a word address, `I' is an indirection bit (normally 0), and B
is the index number of a byte within a two-byte word.  So if we have
`arrayptr' as an object in memory, it is a byte pointer and looks like
the second:

					BWWW...WWW  [arrayptr]

If we take its address, we get a word pointer that points to the above
byte pointer:

	  WWW...WWWI [&arrayptr] ----> BWWW...WWW [arrayptr]

Now we will treat the word pointer on the left as if it were a `pointer
to pointer to short'.  This means we will pretend that what it points
to (on the right) is a `pointer to short'---specifically, that it is
a word pointer:

  actual: WWW...WWWI [&arrayptr] ----> BWWW...WWW [arrayptr]
  pretend:WWW...WWWI [&arrayptr] ----> WWW...WWWI [arrayptr]

Next, we will fetch the thing our `pretend' pointer points to, i.e., 32
bits of `WWW...WWWI'.  The actual bits found at that location are
`BWWW...WWW'.  We will look at the top 31 bits of those 32 bits and
fetch a word from that location, i.e., the location (W/2 + B<<31).  If
arrayptr points to `byte 0 of word at 0x3004', this will be `word at
0x1802', while if arrayptr points to `byte 1 of word at 0x6480', this
will be `word at 0x40003240'.  Once we find that word (if it is in our
address space at all), we will look at the bottom bit, the `I' bit, and
if it is set we will fetch the word to which this word points.  So if
`arrayptr' happens to point to byte 0 of word 0x51379', we will first
look in location 0x289c, see where that points (taken as if it were a
word pointer), and go warping off to wherever that is.

In other words, by closing our eyes and pretending that this byte
pointer is a word pointer, we are going to

	- cut the word address in the byte pointer in half;
	- if the byte pointer pointed to the odd-numbered byte, add 2^31;
	- if the byte pointer was odd, head off into the ozone.

We are definitely NOT going to get two bytes from the place to which
`arrayptr' points.

Just for more fun, we can follow what happens when we use instead a
proper expression:

	((short *)(arrayptr += sizeof(short)))[-1]

On the D/G, this means:

	- add one to `arrayptr', leaving the top bit alone (i.e.,
	  point to the next word);
	- treat the result as if it were a pointer to words, i.e.,
	  shift it left one bit and put a zero in the bottom (I) bit;
	- subtract two from the resulting pointer (i.e., point to
	  the previous word);
	- fetch the word from the resulting location.

The trick is that arithmetic on a pointer depends on what *kind* of
pointer it is.  If it is a byte pointer, we add 1 to move forward one
word, and add 0x80000000 and then add the carry to move forward one
byte.  If it is a word pointer, we add 2 to move forward one word, and
there is no way at all to move forward one byte.  Conversion between
byte and word pointers is not just `bits as is'; it requires shift
instructions.  The compiler does this whenever you have `word pointer'
and `byte pointer' next to each other, but if you cheat (by casting
&foo to some other type) you are telling the compiler to throw away
that information, and skip the conversion.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

throopw@sheol.UUCP (Wayne Throop) (03/18/91)

> eychaner@suncub.bbso.caltech.edu (Amateurgrammer)
> int do_array (void *array, int size, int type)
> {
> unsigned char *arrayptr;
> int i;
> 
> arrayptr = array;   /* If only void * arithmetic were legal in VAX C */
> for (i = 0; i < size; i++) {
>     if (type == UNSIGNED_CHAR) {
>         /* My precedence rules say this is right */
>         *arrayptr++ = some_value();  /* Declared properly elsewhere */
>         }
>     else if (type == SHORT) {
>         /* My precedence rules ALSO say THIS is right */
>         *(short *)arrayptr++ = othervalue();  /* Declared elsewhere */
>         }
>     }
> }

Since what is wanted is a "generic" routine to operate on arrays of
integers of various types (and since the common "wrapping" of this
operation is large enough to want to commonize in a single routine),
I'd try to make the binding of the void to specific type as localized
and clear as possible.  Tweaking the above example, I'd suggest:

int do_array (void *formal_array, int size, int type) {
    int i;
    unsigned char *uchar_array = (unsigned char*)formal_array;
    short *short_array = (short*)formal_array;
    for (i = 0; i < size; i++) {
        if (type == UNSIGNED_CHAR)
            uc_array[i] = some_value();
        else if (type == SHORT) 
            short_array[i] = othervalue();
    }
}

Further, if the persistence of the two "interpretations" of the
"generic" argument didn't need to interpenetrate to make the loop
control common, I'd segregate them, so that errors involving using the
wrong interpretation of the formal would be less likely.  For example:

int do_array (void *formal_array, int size, int type) {
    int i;
    if (type == UNSIGNED_CHAR) {
        unsigned char *array = (unsigned char*)formal_array;
        for (i = 0; i < size; i++)
            array[i] = some_value();
    }
    else if (type == SHORT) {
        short *array = (short*)formal_array;
        for (i = 0; i < size; i++)
            array[i] = othervalue();
    }
}

There are further tricks that can be played with macros and so on
(and more cleanly with by-name bindings in other languages), but
in C these alternatives are mostly pretty disgusting in all but
trivial examples.
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

throopw@sheol.UUCP (Wayne Throop) (03/18/91)

> torek@elf.ee.lbl.gov (Chris Torek)
>>	*((short *)arrayptr)++ = othervalue();
> But this does not mean anything. 
> [.. general explanation of why omitted... skip to the specific example ..]
> I will take a typical C compiler for a Data General MV series machine.
> The Data General MV series has two kinds of pointers, `byte pointers'
> and `word pointers'.  Both are 32 bits long, but one looks vaguely like
> this:  WWW...WWWI  [note: I am deliberately leaving out the ring stuff]
>                    (mostly because I cannot remember how it worked)
> and the other like this:    BWWW...WWW
> where W is a word address, `I' is an indirection bit (normally 0), and B
> is the index number of a byte within a two-byte word.

Well...  Chris has the ideas right, but as he forwarned, the specifics
are slightly off.  But, his description of how one can "head off into
the ozone" by pointer punning is correct in essence.  Also, I think
other once-widespread word oriented machines do/did it the way Chris
outlines above. 

But to go into boring detail in the DG case, this all started with the
word-oriented NOVA architecture, which had no byte operations, and no
byte-granular addresses supported by single instructions.  It's 64Kbyte
address space was addressed as 32K words, with an indirect bit.  These
machines and their descendents were big-endian, and I'll write the bits
in decreasing significance.  The NOVA address layout was
                                              IWWWWWWWWWWWWWWW

Then byte operations were added to the instruction sets of these
16-bit machines (using a trick that deserves an essay all on its
own, but I digress...), giving the "Eclipse" series.  The indirect
bit was dropped, and a byte address was formed in the "cannonical"
way, giving the two addressing formats, word: IWWWWWWWWWWWWWWW
(just as in the NOVA) and the new byte:       BBBBBBBBBBBBBBBB.

Thus, a byte address was gotten by left-shifting a word address one bit
(losing the indirect bit in the process) and adding a 0 or 1 to the
resulting integer representation to indicate which byte in that word was
meant.  The resulting byte address was pretty much the familiar address
you'd find on most byte-addressed big-endian machines.

Then things were expanded to 32 bits with eight rings, by adding yet
more instructions to act on "wide" addresses (mostly the same mnemonics
with "W" prefix (and other prefixes and suffixes in a scheme that would
take yet *another* essay, but I digress again)).  In this scheme, word
addresses were:           IRRRWWWWWWWWWWWWWWWWWWWWWWWWWWWW 
and byte addresses were:  RRRBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

There are also bit addresses to handle bit-granular accesses, which are
32 bits long in the 16-bit Eclipses, and (I think I remember) 64 bits
long in the 32-bit "MV series" Eclipses, and special instructions to
manipulate bits, and other special instruction formats that give faster
access to certain regions of the address space reachable with short
offsets, and wide and narrow stacks, and some historical glitches about
where the stack pointers pointed, and so on and on.  But the above
suffices to give a flavor of the thing. 

One "nice" thing about the shift involved in going from one type of
pointer to the other: in the ring-protected MV series, this meant that
you would get an access violation if you used the wrong flavor pointer,
instead of just trashing something random.  (Unless this bug was in
the kernel and the other ring had something mapped and... well, you
get the idea.)

The result was that DG compilers had a hard time, because much C code
exists which assumes that all pointers have the same format, despite
the fact that even K&R explicitly warned against this sort of thing.
Detailing the tradeoffs that were made in compiling C code for MV
machines is (you guessed it) another essay's worth.

---

( It is almost frightening to think of, but despite my frequent
  references to details I've left out of the above, there were even
  more endless details left out that I didn't even mention.  So, if
  you are a DG fan, and are tempted to tell me about the earlier
  origins of the byte pointer format than the Eclipse instruction
  set, or about various gradiations of Eclipse stuff, have more pity
  on me than I've had on comp.lang.c, and spare me.

  So what's frightening about it?  Well, the fact that I *know* about
  all this useless stuff, taking up memory space better devoted to
  other things... THAT's what frightening! )
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw