[comp.lang.c] standardizing integral type sizes

kyle@xanth.UUCP (04/07/87)

I would like to see the sizes of C integral types standardized.  It would be
much easier to write portable code if when I define a variable as an 'int' I
could automatically know that its range is -128 to 127, or -32768 to 32768,
etc.

One proposal might be:

	char	 8 bits
	short	16 bits
	int	32 bits
	long	64 bits

Having compilers on 16-bit machines generate code to handle 64-bit long's may
be cumbersome, but just knowing how "large" each type is would (probably) make
programmers more conscientious about which types they use.  Thus those "huge"
long's won't be needlessly used as often.

Besides savaging quite a few C implementations, what are the other drawbacks
to this?  Has this already been proposed?

kyle@xanth.cs.odu.edu    (kyle jones @ old dominion university, norfolk, va)

dlnash@ut-ngp.UUCP (04/08/87)

In article <791@xanth.UUCP>, kyle@xanth.UUCP (kyle jones) writes:
> I would like to see the sizes of C integral types standardized.
> [...]
> 
> One proposal might be:
> 
> 	char	 8 bits
> 	short	16 bits
> 	int	32 bits
> 	long	64 bits
> 

This is a good idea in theory, but not in practice.  What do you do
about machines which don't have a word size which is some power of two
(like CDC Cybers or DEC-20s)?  Doing 64 bit arithmetic on a machine with
a 60 bit word size (Cybers) or a 36 bit word size (DEC-20s) would be
extremely difficult.  Another idea would be something like this:

        char    at least  8 bits (possibly more)
        short   at least 16 bits (possibly more)
        int     at least 32 bits (possibly more)
        long    at least 64 bits (possibly more)

Then you would have at least the range you expected.  The compiler could then
choose the size most advantageous for your machine.  If you need to construct
bit masks which depend on the operand size (not usually necessary, usually a
bad idea), you can use our old friend sizeof.

				Don Nash

UUCP:    ...!{ihnp4, allegra, seismo!ut-sally}!ut-ngp!dlnash
ARPA:    dlnash@ngp.UTEXAS.EDU
BITNET:	 CCEU001@UTADNX, DLNASH@UTADNX
TEXNET:  UTADNX::CCEU001, UTADNX::DLNASH

msb@sq.UUCP (04/09/87)

Johnathan Tainter (jtr485@umich.UUCP) writes:

> short, int, long is the dumbest part of C.  The language should have said
> there will be a class of types int<ext> where <ext> is the number of bits.
> [You should] add macros ... so you CANNOT use int, long etc.

And Kyle Jones (kyle@xanth.UUCP) writes:

] I would like to see the sizes of C integral types standardized.  It would be
] much easier to write portable code if when I define a variable as an 'int' I
] could automatically know... its range... [e.g.] char 8 bits, short 16, ...

This comes up all the time, but perhaps it is worth rebutting it again.
The first reason these notions are bad is the presumption that there are
only a very small number of word sizes.  What do you do on a 36-bit 
machine, for instance?

The second thing is, yes, efficiency.  The Draft Standard DOES specify
MINIMUM ranges for the different types.  In effect it guarantees...

	char <= short <= int <= long
	char >= 8 bits, short >= 16 bits, int >= 16 bits, long >= 32 bits

... with the further presumption, which has been part of C for a long
time, that int operations are at least as efficient as other types.

What this means is that if you always use

	if you may need more than 16 bits,
		long
	else if time efficiency matters more than space efficiency,
		int
	else
		short or char

then your compiler will give you what you really need in the way most
suited to WHATEVER machine you may run on.  Now how can you do better
than that?

Well, the world is not quite as perfect as I am implying here.  If you
require variables exceeding 32 bits, your code is certainly nonportable
whatever you do, because no variable is guaranteed such accuracy.
Also, the Draft is not yet a Standard, and I've heard of compilers where
short = 8 bits.  (No, I don't remember which ones.)  Finally, char may
be signed or unsigned on different machines.  (The Draft takes care of
this by defining a "signed char"; then "char" may be like "signed char"
or like "unsigned char" depending on the machine, and should only be
used for actual character values.)

But for the usual range of sizes and portability issues, following the
above guidelines works just fine.  Thinking you have to know the range
of values of a type is a holdover from other languages; knowing the
MINIMUM range is quite sufficient.

Oh, and by the way...

> > 	casting.  The assignment operator is pretty forgiving...it
> > 	knows what the types on both sides need be.
> Assignment had better be forgiving since type casts are defined in
> terms of it.

Not in the Draft Standard.  They're both defined in terms of type
conversion.  Indeed, the Draft REQUIRES certain conversions involving
pointers to be done by casting, e.g. char *cp; int *ip; ip = (int *) cp;

Mark Brader

greg@utcsri.UUCP (04/11/87)

In article <791@xanth.UUCP> kyle@xanth.UUCP writes:
>One proposal might be:
>	char	 8 bits
>	short	16 bits
>	int	32 bits
>	long	64 bits
>
>Having compilers on 16-bit machines generate code to handle 64-bit long's may
>be cumbersome...

Not anywhere near as cumbersome as having those same compilers generate code
to handle 32-bit ints.

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

mwm@eris.UUCP (04/14/87)

In article <1987Apr9.155110.28398@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
>The second thing is, yes, efficiency.  The Draft Standard DOES specify
>MINIMUM ranges for the different types.  In effect it guarantees...
>
>	char <= short <= int <= long
>	char >= 8 bits, short >= 16 bits, int >= 16 bits, long >= 32 bits
>
>... with the further presumption, which has been part of C for a long
>time, that int operations are at least as efficient as other types.

Unfortunately, these are only specified by implication (so far as I
can tell). If someone can provide a paragraph number, where all of
this is specified (or anywhere where "int operations are at least..."
is specified), I'd appreciate it.

>then your compiler will give you what you really need in the way most
>suited to WHATEVER machine you may run on.  Now how can you do better
>than that?

Something that did what you said. For instance, consider a
hypothetical Queer Machine for C (QM/C), which has 18 bit words (word
addressed), and instructions for dealing with double words. The
obvious implementation has int = short = 18 bits, and long = 36 bits.
Now, supposed I need 16 bits of magnitude on a signed value. For this
machine, declaring things as int works just fine. But the program will
not work on "standard" machines, because it really wants longs for
that value. But if I declare things as "long," I chew up twice the
space for storage, and presumably more time.

In other words, following your advice doesn't get me what I really
need, and does it a way incredibly inappropriate for QM/C.

>Well, the world is not quite as perfect as I am implying here.  If you
>require variables exceeding 32 bits, your code is certainly nonportable
>whatever you do, because no variable is guaranteed such accuracy.

True. It would be nice if I could declare things so that the program
would break at _compile_ time. There's actually an easy way to do
this, at zero cost to those who don't need bit-level control of their
data types.

	<mike
--
Here's a song about absolutely nothing.			Mike Meyer        
It's not about me, not about anyone else,		ucbvax!mwm        
Not about love, not about being young.			mwm@berkeley.edu  
Not about anything else, either.			mwm@ucbjade.BITNET

manis@ubc-cs.UUCP (04/14/87)

In article <3162@jade.BERKELEY.EDU> mwm@eris.BERKELEY.EDU (Mike (My watch
has windows) Meyer) writes:

>Unfortunately, these are only specified by implication (so far as I
>can tell). If someone can provide a paragraph number, where all of
>this is specified (or anywhere where "int operations are at least..."
>is specified), I'd appreciate it.
This is a good place for an appendix which is not part of the standard,
much like the "common extensions" appendix.

>Something that did what you said. For instance, consider a
>hypothetical Queer Machine for C (QM/C), which has 18 bit words (word
>addressed), and instructions for dealing with double words. The
>obvious implementation has int = short = 18 bits, and long = 36 bits.
If the QM/C uses one's complement, you have the PDP-9. I never heard of
a PDP-9 C compiler...

This isn't a standardisation issue, but if I wrote a C compiler for a
machine with short words, you would have a pragma which set the precision of
'int'. 

-----
Vincent Manis                {seismo,uw-beaver}!ubc-vision!ubc-cs!manis
Dept. of Computer Science    manis@cs.ubc.cdn
Univ. of British Columbia    manis%ubc.csnet@csnet-relay.arpa  
Vancouver, B.C. V6T 1W5      manis@ubc.csnet
(604) 228-6770 or 228-3061

"Long live the ideals of Marxism-Lennonism! May the thoughts of Groucho
 and John guide us in word, thought, and deed!"

rbutterworth@watmath.UUCP (04/15/87)

In article <3162@jade.BERKELEY.EDU>, mwm@eris.BERKELEY.EDU (Mike (My watch has windows) Meyer) writes:
> Something that did what you said. For instance, consider a
> hypothetical Queer Machine for C (QM/C), which has 18 bit words (word
> addressed), and instructions for dealing with double words. The
> obvious implementation has int = short = 18 bits, and long = 36 bits.
> Now, supposed I need 16 bits of magnitude on a signed value. For this
> machine, declaring things as int works just fine. But the program will
> not work on "standard" machines, because it really wants longs for
> that value. But if I declare things as "long," I chew up twice the
> space for storage, and presumably more time.

I've never actually done this myself, but if you are really worried
about specifying minimum integer size and having the source still
portable, you could set up a header file for each different
architecture/compiler something like this:
    typedef int int1;
    typedef int int2;
    ...
    typedef int int18;
    typedef long int19;
    typedef long int20;
    ...
    typedef long int36;
Then when you have a variable that needs at least 12 bits, you can
declare it as "int12 var;".  The typedef in the header file will
give you the most efficient type for the variable.  "int5" would
be typedefed to "int" on most machines, to "short" on those machines
whose short instructions are as fast and as small as for ints, and
as "signed char" on those machines that have fast char arithmetic.

In those applications where you ask for "int35", it would not compile
on machines that can't handle 35 bit integers.  This is of course
exactly what you want.

Note that "extern char x,y,z; x=y+z;" can generate several times
the amount of code as "extern int x,y,z; x=y+z;" on some machines.
Thus if you want 5 bit integers, "int5" would give you "int" on
those machines.

Note that a parallel set of typedefs (e.g. pack8) would be needed
when the concern is for saving space (e.g. large arrays) and not
code efficiency.

msb@sq.UUCP (04/15/87)

Mike (My watch has windows) (!) Meyer (mwm@eris.BERKELEY.EDU) writes:
> In article <1987Apr9.155110.28398@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
> >... The Draft Standard ... In effect ... guarantees ...
> >
> >	char <= short <= int <= long
> >	char >= 8 bits, short >= 16 bits, int >= 16 bits, long >= 32 bits
> >
> >... with the further presumption, which has been part of C for a long
> >time, that int operations are at least as efficient as other types.
> 
> Unfortunately, these are only specified by implication (so far as I
> can tell). If someone can provide a paragraph number ...

Well, I did say "in effect".  I guess you want the actual wording.
Section 3.1.2.5 reads in part...

# There are four types of signed integers, called signed char, short int,
# int, and long int.  ...
#
# A signed char occupies the same amount of storage as a "plain" char.
# A "plain" int has the natural size suggested by the architecture of the
# execution environment. ... The set of values of each signed integral
# type is a subset of the values of the next type in the list above.

This covers the first set of inequalities and the efficient-ints rule.
The actual minimum sizes are implied by the values of SCHAR_MAX, SHRT_MAX,
INT_MAX, and LONG_MAX, and the corresponding _MIN values, tabulated in
Section 2.2.4.2.  (Notice, incidentally, that the values are chosen
in such a way that sign-magnitude or 1's complement arithmetic is okay
within the word lengths mentioned; thus the smallest known-to-be-a-valid-int
value is -32767 and not -32768.)

> ... consider a
> hypothetical Queer Machine for C (QM/C), which has 18 bit words ...
> obvious implementation has int = short = 18 bits, and long = 36 bits.
> Now, suppose I need 16 bits of magnitude on a signed value.

Right, you have to declare it long for portability even though int would
work on such a machine.  I think I alluded to this in my own posting.
But the thing is, this is a RARE CASE.  If you have variables that you
KNOW will need 17 bits of sign and magnitude but not as many as 19,
AND you have enough of them or they are frequently enough used that
efficiency on QM/C's is a problem, THEN by all means do tricks with
ifdefs and typedefs.  For normal cases, the guidelines I suggested
before will work.

> It would be nice if I could declare things so that the program
> would break at _compile_ time [if you need variables >32 bits].

Try:
	#if (1UL << 35 == 0)
		sorry, machine must have at least 36-bit longs
	#endif

(I'm not positive whether the U is necessary -- the Draft Standard doesn't
mention what << does in case of overflow when the left operand is signed,
and I don't want to think about it.)

Mark Brader

henry@utzoo.UUCP (Henry Spencer) (04/16/87)

> I would like to see the sizes of C integral types standardized.  It would be
> much easier to write portable code if when I define a variable as an 'int' I
> could automatically know that its range is -128 to 127, or -32768 to 32768,
> etc.
> ...
> Besides savaging quite a few C implementations, what are the other drawbacks
> to this?...

It loses us the current semantics of "int", which are "the form of integer
that is most efficient on the machine in question".  Since most well-written
code doesn't care whether int is 16 or 32 bits (N.B. there is a lot of
badly-written code in the world), current C compilers are free to pick the
one that runs faster.  This can make quite a difference in performance.
All the world is *not* a VAX.

To quote Dennis Ritchie:  "if you want PL/I, you know where to find it".
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

doug@edge.UUCP (Doug Pardee) (04/16/87)

> The second thing is, yes, efficiency.  The Draft Standard DOES specify
> MINIMUM ranges for the different types.  In effect it guarantees...
> 
> 	char <= short <= int <= long
> 	char >= 8 bits, short >= 16 bits, int >= 16 bits, long >= 32 bits
> 
> ... with the further presumption, which has been part of C for a long
> time, that int operations are at least as efficient as other types.

I dunno about this last presumption having been part of C for a long time.
(I take that to mean K&R spec).  The closest I can find is in K&R sec. 2.2,
which says (twice) that "int" reflects the "natural size of integers on
the host machine."

I bring this up because on the C compilers I've used on the 68000, "int"
has always been a 32-bit quantity.  This is almost a necessity, because
of the well-known bad habit of assuming that a pointer will fit in an int.
But 32-bit ints on the 68000 are nowhere near as efficient as 16-bit ints.
They require twice as many memory accesses, and multiplication and division
have to be performed with subroutines.  This latter point can turn a simple
subscripting operation into a performance catastrophe.

-- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona

msb@sq.UUCP (04/20/87)

> > The second thing is, yes, efficiency.  The Draft Standard DOES specify
> > MINIMUM ranges for the different types.  In effect it guarantees...
> > ... with the further presumption, which has been part of C for a long
> > time, that int operations are at least as efficient as other types.
> 
> I dunno about this last presumption having been part of C for a long time.
> (I take that to mean K&R spec).  The closest I can find is in K&R sec. 2.2,
> which says (twice) that "int" reflects the "natural size of integers on
> the host machine."

This is what I meant; much the same language is in the Draft (sec. 3.1.2.5).
I think there's a general presumption that "natural" implies "most efficient".

> I bring this up because on the C compilers I've used on the 68000, "int"
> has always been a 32-bit quantity.  This is almost a necessity, because
> of the well-known bad habit of assuming that a pointer will fit in an int.
> But 32-bit ints on the 68000 are nowhere near as efficient as 16-bit ints.

If this is accurate, it means that the compiler writers had to choose
between making existing "well-known" badly-written code run at all
without being fixed, and making well-written code run more slowly than
it should.  The decision strikes me as -- no pun intended -- short-sighted.
But my assumption is that proper typing will become more widespread in
the future, which may be wrong; and, as someone else pointed out recently,
getting the right answer certainly beats getting the wrong answer fast.

Mark Brader

guy%gorodish@Sun.COM (04/21/87)

>I bring this up because on the C compilers I've used on the 68000, "int"
>has always been a 32-bit quantity.  This is almost a necessity, because
>of the well-known bad habit of assuming that a pointer will fit in an int.

"Almost".  I worked on a 68000-based machine that had a C
implementation 16-bit "int"s and 32-bit pointers; I didn't find it
particularly painful to use, except when I had to fix other people's
code to be type-correct (and I knew enough to blame *that* on the
other people who wrote that code, not on the C implementation).

>But 32-bit ints on the 68000 are nowhere near as efficient as 16-bit ints.
>They require twice as many memory accesses, and multiplication and division
>have to be performed with subroutines.  This latter point can turn a simple
>subscripting operation into a performance catastrophe.

Nope.  Multiplication by a constant can be done in-line, with a
sequence of shifts and adds, and the most common type of
multiplication in subscripting operations is multiplication by a
constant.