gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/27/85)
One of the fellows here ran across a C coding style problem that caused an application to break when ported from little- endian machines to a big-endian. I am posting this to help those who may encounter similar problems in the future. /* modeled after a grammar that builds expression trees: */ struct node { int op; struct node *left, *right; double val; } *lp, *rp, *p, *makenode(); double value; /* constant value from lex */ ... p = makenode( 'x', lp, rp ); /* create multiply node */ ... p = makenode( 'k', value ); /* create real constant node */ ... struct node *makenode( op, arga, argb ) int op; struct node *arga, *argb; { extern char *malloc(); struct node *new = (struct node *)malloc( sizeof(struct node) ); if ( new == 0 ) punt( "no space" ); switch ( new->op = op ) /* node type */ { case 'x': /* (binary) multiplication */ new->left = arga, new->right = argb; break; ... case 'k': /* real constant */ new->val = *(double *)&arga; /* XXX */ break; } return new; } The code failed at point "XXX". The exercise for the student is to figure out precisely WHY it fails on some machines but works on others. Do not post your answers, unless you think you see a unique twist; this is a well-known C coding trap that is in violation of language standards. A second exercise for the student is to figure out how to fix this code; note that simply adding a typecast (struct node *) when the function is invoked is NOT sufficient. No need to post these answers either. Maybe Laura will include this in her book.
latham@bsdpkh.UUCP (Ken Latham) (10/30/85)
Concerning the following 'test' program .... which functions on some but not all machines. > struct node { int op; struct node *left, *right; double val; } > *lp, *rp, *p, *makenode(); > double value; /* constant value from lex */ > ... > p = makenode( 'x', lp, rp ); /* create multiply node */ > ... > p = makenode( 'k', value ); /* create real constant node */ > ... > struct node *makenode( op, arga, argb ) > int op; struct node *arga, *argb; > { extern char *malloc(); > ... > case 'k': /* real constant */ > new->val = *(double *)&arga; /* XXX */ > break; > } > return new; > } > > The code failed at point "XXX". The exercise for the > student is to figure out precisely WHY it fails ...... > I'm not a student ( by the formal def. ), but I LOVE puzzles!!! The answer becomes rather obvious after a little thought... It is hidden in what can be refered to as "an implied sizeof" that is necessary for the passage of variables from one function to another. The size of (node *) is the same as the size of (double) on SOME machines but not others ( I'm fairly sure it is not on MOST ). On those machines where the size differs , the passage of the double into a space too small to contain it usually causes an error on reference to it. ( Why? because stack space is given for an 'int' and 2 'node *' when a double exceeds the space of two pointers there may be problems depending on what is happens to the stack .... ) (NOTE: the 'int' space is given and used in both cases ) The reference to it copies from a space the size of double on the stack starting at the beginning of a space too small to contain it, the excess space used by the double may be corrupted ( especially if the reference is late in the function ). Here the call to malloc corrupts the overage used by the double since it utilizes the stack ( as all good little functions do. ) > .... A second exercise for the student > is to figure out how to fix this code ...... > the fix..... pass a pointer to the double instead of the double itself. i.e. p = makenode ('k', &value ); ... and reference it as such .... struct node *makenode( op, arga, argb ) int op; struct node *arga, *argb; { extern char *malloc(); ... case 'k': /* real constant */ new->val = (double *)arga; /* XXXXXXXXXXX */ break; ... } There's a chance that there may be other problems I am not aware of but this seems to be at least one case where "sometimes it works and somtimes it doesn't." Concerning those who have a great deal of experience: " There are those who have many experiences... and those who have few experiences many times." -- I'm not sure who -- -- but not me -- _______________________________________________________________________________ | | | | Ken Latham | uucp: ihnp4!bsdpkh!latham | | AT&T-IS | uucp: {ihnp4!decvax,peora}!ucf-cs!latham | | Orlando , FL | arpa: latham.ucf-cs@csnet-relay | | USA | csnet:latham@ucf | |_____________________________|_______________________________________________| -- Concerning those who have a great deal of experience: " There are those who have many experiences... and those who have one experience many times." _______________________________________________________________________________ | | | | Ken Latham | uucp: ihnp4!bsdpkh!latham | | AT&T-IS | uucp: {ihnp4!decvax,peora}!ucf-cs!latham | | Orlando , FL | arpa: latham.ucf-cs@csnet-relay | | USA | csnet:latham@ucf | |_____________________________|_______________________________________________|
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/02/85)
> One of the fellows here ran across a C coding style problem > that caused an application to break when ported from little- > endian machines to a big-endian. I am posting this to help > those who may encounter similar problems in the future. Rwells@BBN-PROPHET.ARPA has pointed out to me that this is not necessarily a big-endian problem, and that the example would work on many big-endian systems and may fail on some little- endians. It should really be described as a program that malfunctioned when ported to a machine with *more stringent alignment constraints*. Those of you who were puzzled may find this to be the clue you needed. (Big-endianness would have to be combined with data addressing at the high-address byte before it would be the causal factor; this combination appears to be rarer than I at first believed.)
gwyn@BRL.ARPA (VLD/VMB) (11/03/85)
I asked that answers to the puzzle please NOT be posted to the mailing list / newsgroup, to avoid flooding it with zillions of messages (mostly wrong). There is no problem with passing a double datum as a parameter to a routine that is declared with smaller formal parameters. Just consider what must be happening for printf() and you will see that any number of parameters of any size may be passed to a function. Therefore, the solution to the puzzle is not that the double datum "doesn't fit" or "clobbers something outside the parameters". I suppose I should post the answer in a few days..