[comp.lang.c] Moving C from DOS to UNIX, ANSI mistakes

marks@lcc.la.Locus.COM (bookmark) (09/07/89)

<---- bug snack

>In article <1287@calvin.EE.CORNELL.EDU>, richard@calvin.EE.CORNELL.EDU (Richard Brittain) writes:
>> ...I learned C on a pc using Turbo-C, and now I'm trying to write in C
>> on a unix box (BSD and Ultrix) but nothing works!!!!!.  All of my pc source
>> gives multitudinous errors under unix...
>> ...Function prototypes seem to give cc a fit, and...
>> I also get a lot of miscellaneous errors and warnings like "warning: old
>> fashioned initialization" that I cannot make sense of.  I'd be really 
>> grateful if anyone could give any general rules of thumb for converting
>> between the two environments.
>
Then in article <1116@virtech.UUCP>, Conor P. Cahill writes:
>My *guess* would be that the turbo-c compiler is much more ANSI compliant
>than the older compilers used on your BSD/Ultrix systems.  If you really
>need to be portable accross these environments I would develop the software
>on the BSD/Ultrix system and then port it to turbo-c. This gets you writing
>the software at the "least common denominator" level of compiler.  An ANSI
>compiler should not have too much trouble compiling software generated
>under an older (K&R 1st Ed) compiler since that was part of thier mandate.
>

Good advice and correct I think, but incomplete.

The truth is that the X3.159 Committee blew off the compatability
goal in several places.  The worst of these mistakes, even though
the rationale explains (I'm looking at an October '88 draft) that
"existing code is important" was that they changed the unsigned
conversion rules away from K&R without good cause (they said:
"this is considered the most serious semantic change made by the
Committee..." and I agree).

You should develop your code using only K&R features for maximum
portability.  But, do not rely on the semantics of arithmetic
involving unsigned vars, especially where you are trying to
assign values from one size of int to another (for example, the
assignment "u_int = (unsigned) short" will likely NOT do what
you intend (on a VAX).  It is not possible to prevent ANSI C from
sign-extending your variable without using a cast that would truncate
your value if you later lengthened the type of the variable.



I should point out that my objection to the new rule is that
it absolutely prevents writing simple portable code which moves
from one similar environment to another (say, one POSIX system
to another) if any of the variables used are externally specified,
like POSIX structure members are.  Consider the POSIX stat()
call which fills in an externally specified (in stat.h) struct
stat which has members st_dev and st_ino which may be of any
integral type (via dev_t and ino_t) (yes, my friends, there
are UN*X systems with 32-bit inode numbers).  If you want to
convert these values to unsigned for some reason, you'll have
to write either painful or non-portable code.  See this (drawn
from the real world with minor mods and elisions) example:


/*
 * We use double hashing...
 * Some trickery is used in converting dev/inode pairs
 * to hash keys.  We expect that most dev and inode values will
 * be <= 32K; so we can come up with a single long key pretty
 * easily.  However, we don't want to blow any other significant
 * bits off completely, so we rotate the dev value by 15 bits and
 * XOR it with the inode value.  In the assumed usual case, this
 * will preserve all the bits from the dev/inode pair, in less
 * usual cases at least some influence will be felt from each bit.
 */

>>>>> In the original code, I rotated the dev value by 16 bits,  <<<<<
>>>>> but for this example I didn't want the problems with       <<<<<
>>>>> unsigned conversion overshadowed by the fact I happened    <<<<<
>>>>> to be working with shifts of half- or double-word lengths. <<<<<


#ifndef __STDC__ /* assume K&R1-style unsigned conversion */

/* struct stat *s; */

/* note that only unsigned is guaranteed 0-filled right shifting
 * and that sign-extension of short st_dev or st_ino before
 * conversion to u_long would be undesirable.  This code works
 * for st_dev and st_ino of any integral type (even char).
 */
#define HKEY(s)	(((unsigned long)(s)->st_dev << 15) \
		| (((unsigned long)(s)->st_dev>>(LONGBITS-15))&0x7fffL) \
			^ (unsigned long)(s)->st_ino)

#else /* __STDC__ */
	/* The new unsigned conversion rules are stupid because
	 * they inhibit rather than promote the writing of portable
	 * code.  Since you must know the length of a thing before
	 * you can convert it to unsigned without using non-portable
	 * masks, painful and unnecessary computation, or this sort
	 * optional gobbledy-gook in the source it is damned near
	 * impossible to prepare code which is both portable and
	 * efficient.  The code below is not portable, dammit (but
	 * at least it'll work on our current systems).
	 */

#ifdef SHORT_DEV_INO /* if "typedef short dev_t, ino_t;" */
  /* perhaps could be:
   * #if (sizeof(dev_t)==sizeof(short)) && (sizeof(ino_t)==sizeof(short))
   */

/* struct stat *s; */

#define HKEY(s)	(((long)(unsigned short)(s)->st_dev << 15) \
		| (((unsigned long)(unsigned short)(s)->st_dev \
						>> (LONGBITS-15)) & 0x7fffL) \
			^ (long)(unsigned short)(s)->st_ino)

#else /* dev_t and ino_t are longs */

	/* if we used the SHORT_DEV_INO macro here
	 * it would discard half of our bits!
	 */

#define HKEY(s) (((long) (s)->st_dev << 15) \
		| (((unsigned long) (s)->st_dev >> (LONGBITS-15)) & 0x7fffL) \
			^ (long) (s)->st_ino)

#endif /* long dev_t, ino_t */

	/* Note that we're blowing off the case of
	 *	typedef short dev_t;
	 * 	typedef long ino_t;
	 * or the reverse and all cases involving char types.
	 */

#endif /* __STDC__ */


Ahem.  Where was I?  Oh, yeah.  ANSI C source considered by itself
may be "portable" but in the real world where it interacts with
external stuff like POSIX or MS-DOS or VAX/VMS... it would be nice
if things with the same name, and similar types, could be used in
portable expressions to get the same results.  The K&R unsigned
rule ("unsigned always wins") provides this, the lousy X3.159
rule (sign extend before converting to unsigned) does NOT.

(I would think any discussion about whether K&R1 provides "unsigned
long" belongs in another thread (I think K&R1 allows u_long but I
admit the possibility of argument).  The X3.159 Committee could just
as well have adopted the correct unsigned conversion rules and the
ANSI C language does have unsigned long.)


Mark Seecof, Locus Computing Corp., Los Angeles, (213) 337-5218.
My opinions only, of course...

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/09/89)

In article <64@lcc.la.Locus.COM> lcc!marks@SEAS.UCLA.EDU (bookmark) writes:
>It is not possible to prevent ANSI C from sign-extending your variable
>without using a cast that would truncate your value if you later
>lengthened the type of the variable.

Surely when you change the data type of the variable, it behooves you
to also change the casts etc. in expressions that use it?

>stat which has members st_dev and st_ino which may be of any
>integral type (via dev_t and ino_t) (yes, my friends, there
>are UN*X systems with 32-bit inode numbers).  If you want to
>convert these values to unsigned for some reason, you'll have
>to write either painful or non-portable code.

Any reasonable operations with these types will preserve their values.
There has already been a problem in the UNIX environment with 16- vs.
32-bit, signed vs. unsigned, etc. types in such usages as you describe.
On balance, the problem is not made worse by adopting value-preserving
semantics, and in more typical situations unsignedness-preserving rules
are generally undesirable.

>... it would be nice if things with the same name, and similar types,
>could be used in portable expressions to get the same results.  The
>K&R unsigned rule ("unsigned always wins") provides this, ...

Not in my experience it hasn't.

Certainly, code that cares about exact bit patterns (such as your
hashing example) needs to go through excruciating pains to keep the
types appropriate for the operations.  That has always been the case.

By the way, the committee is X3J11; X3.159 is the Standard itself.

clyde@hitech.ht.oz (Clyde Smith-Stubbs) (09/11/89)

From article <64@lcc.la.Locus.COM>, by marks@lcc.la.Locus.COM (bookmark):
> The truth is that the X3.159 Committee blew off the compatability
> goal in several places.  
>  But, do not rely on the semantics of arithmetic
> involving unsigned vars, especially where you are trying to
> assign values from one size of int to another (for example, the
> assignment "u_int = (unsigned) short" will likely NOT do what
> you intend (on a VAX).  

It never did do what you wanted. Back in 1984 (Pre-ANSI) I used a VAX
compiler (I think it was BSD4.2 based) that would sign extend when
you assigned a char to an unsigned. This was one of the reasons the
ANSI committee chose value preserving over unsignedness preserving - even
though K&R implied unsignedness preserving was the way to do it, it
was not spelled out and not all compilers conformed anyway.
(For the record, I would have preferred unsignedness preserving, but I
have come to terms with value preserving).

> I should point out that my objection to the new rule is that
> it absolutely prevents writing simple portable code which moves
> from one similar environment to another (say, one POSIX system
> to another) if any of the variables used are externally specified,
> like POSIX structure members are. 
>	[more stuff basically saying that to take something of type
>	ino_t, where ino_t is a typedef of some integral type but
>	whose length is either unknown or at least may vary from
>	system to system, there is no portable way of converting this
>	without sign extension to a longer, unsigned integral type.]

Ok, what is wrong with simply using a cast like (unsigned ino_t). This converts
the unspecified (but integral) ino_t to an unsigned of the same length. This
is then quite safe to assign without any sign-extension to any other
integral type.

The decisions made by the ANSI committee were not easy, but from what I have
seen they were considered decisions. I disagreed with some of them, but
I am prepared to work with them because any usable standard is better than
no standard at all, and the ANSI C standard is usable - it has no 
insurmountable problems I know of that are not basic to C.

-- 
Clyde Smith-Stubbs
HI-TECH Software, P.O. Box 103, ALDERLEY, QLD, 4051, AUSTRALIA.
INTERNET:	clyde@hitech.ht.oz.au		PHONE:	+61 7 300 5011
UUCP:		uunet!hitech.ht.oz.au!clyde	FAX:	+61 7 300 5246

karl@haddock.ima.isc.com (Karl Heuer) (09/15/89)

In article <326@hitech.ht.oz> clyde@hitech.ht.oz (Clyde Smith-Stubbs) writes:
>Ok, what is wrong with simply using a cast like (unsigned ino_t).

It's not legal C.