[net.lang.c] C portability gotcha, example

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/27/85)

One of the fellows here ran across a C coding style problem
that caused an application to break when ported from little-
endian machines to a big-endian.  I am posting this to help
those who may encounter similar problems in the future.

/* modeled after a grammar that builds expression trees: */

struct node { int op; struct node *left, *right; double val; }
	*lp, *rp, *p, *makenode();
double value;			/* constant value from lex */
...
p = makenode( 'x', lp, rp );	/* create multiply node */
...
p = makenode( 'k', value );	/* create real constant node */
...
struct node *makenode( op, arga, argb )
int op; struct node *arga, *argb;
{	extern char *malloc();
	struct node *new =
		(struct node *)malloc( sizeof(struct node) );
	if ( new == 0 )
		punt( "no space" );
	switch ( new->op = op )	/* node type */
	{
	case 'x':		/* (binary) multiplication */
		new->left = arga, new->right = argb;
		break;
...
	case 'k':		/* real constant */
		new->val = *(double *)&arga;	/* XXX */
		break;
	}
	return new;
}

The code failed at point "XXX".  The exercise for the
student is to figure out precisely WHY it fails on some
machines but works on others.  Do not post your answers,
unless you think you see a unique twist; this is a
well-known C coding trap that is in violation of
language standards.  A second exercise for the student
is to figure out how to fix this code; note that simply
adding a typecast (struct node *) when the function is
invoked is NOT sufficient.  No need to post these answers
either.  Maybe Laura will include this in her book.

latham@bsdpkh.UUCP (Ken Latham) (10/30/85)

Concerning the following 'test' program ....
    which functions on some but not all machines.

	> struct node { int op; struct node *left, *right; double val; }
	> 	*lp, *rp, *p, *makenode();
	> double value;			/* constant value from lex */
	> ...
	> p = makenode( 'x', lp, rp );	/* create multiply node */
	> ...
	> p = makenode( 'k', value );	/* create real constant node */
	> ...
	> struct node *makenode( op, arga, argb )
	> int op; struct node *arga, *argb;
	> {	extern char *malloc();
	> ...
	> 	case 'k':		/* real constant */
	> 		new->val = *(double *)&arga;	/* XXX */
	> 		break;
	> 	}
	> 	return new;
	> }
	> 
	> The code failed at point "XXX".  The exercise for the
	> student is to figure out precisely WHY it fails ......
	> 

I'm not a student ( by the formal def. ), but I LOVE puzzles!!!

The answer becomes rather obvious after a little thought...

It is hidden in what can be refered to as "an implied sizeof" that is
necessary for the passage of variables from one function to another.

The size of (node *) is the same as the size of (double) on SOME machines
but not others ( I'm fairly sure it is not on MOST ).

On those machines where the size differs , the passage of the double into
a space too small to contain it usually causes an error on reference to it.

( Why? because stack space is given for an 'int' and 2 'node *'  
	when a double exceeds the space of two pointers there may
	be problems depending on what is happens to the stack .... )

	(NOTE: the 'int' space is given and used in both cases )


The reference to it copies from a space the size of double on the stack
starting at the beginning of a space too small to contain it,  the excess
space used by the double may be corrupted ( especially if the reference is
late in the function ).

Here the call to malloc corrupts the overage used by the double since
it utilizes the stack ( as all good little functions do. )

	> .... A second exercise for the student
	> is to figure out how to fix this code ......
	> 

the fix.....

pass a pointer to the double instead of the double itself.

i.e.
	p = makenode ('k', &value );
	
... and reference it as such ....

	struct node *makenode( op, arga, argb )
	int op; struct node *arga, *argb;
	{	extern char *malloc();
	...
		case 'k':		/* real constant */
		new->val = (double *)arga;	/* XXXXXXXXXXX */
		break;
	...
	}


	There's a chance that there may be other problems I am not aware
of but this seems to be at least one case where "sometimes it works and
somtimes it doesn't."


	

  Concerning those who have a great deal of experience:

		" There are those who have many experiences...
		  and those who have few experiences many times."
						-- I'm not sure who --
						--    but not me    --
_______________________________________________________________________________
|                             |                                               |
|        Ken Latham           |     uucp: ihnp4!bsdpkh!latham                 |
|        AT&T-IS              |     uucp: {ihnp4!decvax,peora}!ucf-cs!latham  |
|        Orlando , FL         |     arpa: latham.ucf-cs@csnet-relay           |
|        USA                  |     csnet:latham@ucf                          |
|_____________________________|_______________________________________________|
-- 

  Concerning those who have a great deal of experience:

		" There are those who have many experiences...
		  and those who have one experience many times."
_______________________________________________________________________________
|                             |                                               |
|        Ken Latham           |     uucp: ihnp4!bsdpkh!latham                 |
|        AT&T-IS              |     uucp: {ihnp4!decvax,peora}!ucf-cs!latham  |
|        Orlando , FL         |     arpa: latham.ucf-cs@csnet-relay           |
|        USA                  |     csnet:latham@ucf                          |
|_____________________________|_______________________________________________|

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/02/85)

> One of the fellows here ran across a C coding style problem
> that caused an application to break when ported from little-
> endian machines to a big-endian.  I am posting this to help
> those who may encounter similar problems in the future.

Rwells@BBN-PROPHET.ARPA has pointed out to me that this is not
necessarily a big-endian problem, and that the example would
work on many big-endian systems and may fail on some little-
endians.  It should really be described as a program that
malfunctioned when ported to a machine with *more stringent
alignment constraints*.  Those of you who were puzzled may
find this to be the clue you needed.

(Big-endianness would have to be combined with data addressing
at the high-address byte before it would be the causal factor;
this combination appears to be rarer than I at first believed.)

gwyn@BRL.ARPA (VLD/VMB) (11/03/85)

I asked that answers to the puzzle please NOT be posted
to the mailing list / newsgroup, to avoid flooding it
with zillions of messages (mostly wrong).

There is no problem with passing a double datum as a
parameter to a routine that is declared with smaller
formal parameters.  Just consider what must be
happening for printf() and you will see that any
number of parameters of any size may be passed to a
function.  Therefore, the solution to the puzzle is
not that the double datum "doesn't fit" or "clobbers
something outside the parameters".

I suppose I should post the answer in a few days..