[comp.unix.wizards] Type size problems

james@parkridge.UUCP (06/17/87)

In article <12670@topaz.rutgers.edu> hedrick@topaz.rutgers.edu.UUCP writes:
>Unfortunately in C (as most other languages) there is no distinction
>between how you describe variables to be used within your program and
>how you describe external objects.  The result is that network code
...
>pointers.  But if we are to have any hope of writing portable network
>code, there has to be some way to say that something is a 16 or 32
>bit object.  Currently short and long are it.  Anybody have a better
>idea?  The only alternative I can think of is to use long:16 and
>long:32.  Presumably that would continue to work if longs expanded.

	Forgive me if this is a little bit naive, but what about having
system-wide constants which tells the compilers (for whichever languages are
available) what the sizes of the objects really are?  For example, cc would
know that chars are w bits long, ints are x, shorts are y, and longs are z.

	All the user would have to do would be to set up defines (or whatever)
that request a minimum and maximum size for the objects required and make sure
that these constraints are followed strictly within his/her code.  When the
compiler went at it, it would see the requested sizes and make sure that it
could satisfy them on the current machine while still following the K&R rules.
If it couldn't, it would scream.

For example....

#define MIN_CHAR	8	/* Minimum sizes required, max are optional */
#define MAX_CHAR	8	/* Compiler has free reign to shift about   */
#define MIN_SHORT	8	/* sizes within the limits imposed here...  */
#define MAX_SHORT	16
#define MIN_INT		8
#define MAX_INT		16
#define MIN_LONG	16

	Anyone have any reasons why this sort of thing wouldn't work?? This is
just off the top of my head, but it seems reasonable if you really want
portable code and are prepared to put more work into it.....

  ___________________________________________________________________________
 |                                 | _____  |                                |
 |       James R. Sheridan         |   |  \ | ..utzoo!parkridge!pcssun!james |
 |                                 |   |__/ |                                |
 | Parkridge Computer Systems Inc. |   |/ \ |                                |
 | 710 Dorval Drive,  Suite 115    | \_/\_  | YOU can help wipe out COBOL in |
 | Oakville,  Ontario,  CANADA     |      \ |        our lifetime!!          |
 |  L6K 3V7   (416) 842-6873       |    \_/ |                                |
 |_________________________________|________|________________________________|
-- 
  ___________________________________________________________________________
 |                                 | _____  |                                |
 |       James R. Sheridan         |   |  \ | ..utzoo!parkridge!pcssun!james |
 |                                 |   |__/ |                                |
 | Parkridge Computer Systems Inc. |   |/ \ |                                |
 | 710 Dorval Drive,  Suite 115    | \_/\_  | YOU can help wipe out COBOL in |
 | Oakville,  Ontario,  CANADA     |      \ |        our lifetime!!          |
 |  L6K 3V7   (416) 842-6873       |    \_/ |                                |
 |_________________________________|________|________________________________|

james@parkridge.UUCP (06/17/87)

In article <12670@topaz.rutgers.edu> hedrick@topaz.rutgers.edu.UUCP writes:
>Unfortunately in C (as most other languages) there is no distinction
>between how you describe variables to be used within your program and
>how you describe external objects.  The result is that network code
...
>pointers.  But if we are to have any hope of writing portable network
>code, there has to be some way to say that something is a 16 or 32
>bit object.  Currently short and long are it.  Anybody have a better
>idea?  The only alternative I can think of is to use long:16 and
>long:32.  Presumably that would continue to work if longs expanded.

	Forgive me if this is a little bit naive, but what about having
system-wide constants which tells the compilers (for whichever languages are
available) what the sizes of the objects really are?  For example, cc would
know that chars are w bits long, ints are x, shorts are y, and longs are z.

	All the user would have to do would be to set up defines (or whatever)
that request a minimum and maximum size for the objects required and make sure
that these constraints are followed strictly within his/her code.  When the
compiler went at it, it would see the requested sizes and make sure that it
could satisfy them on the current machine while still following the K&R rules.
If it couldn't, it would scream.

For example....

#define MIN_CHAR	8	/* Minimum sizes required, max are optional */
#define MAX_CHAR	8	/* Compiler has free reign to shift about   */
#define MIN_SHORT	8	/* sizes within the limits imposed here...  */
#define MAX_SHORT	16
#define MIN_INT		8
#define MAX_INT		16
#define MIN_LONG	16

	Anyone have any reasons why this sort of thing wouldn't work?? This is
just off the top of my head, but it seems reasonable if you really want
portable code and are prepared to put more work into it.....

-- 
  ___________________________________________________________________________
 |                                 | _____  |                                |
 |       James R. Sheridan         |   |  \ | ..utzoo!parkridge!pcssun!james |
 |                                 |   |__/ |                                |
 | Parkridge Computer Systems Inc. |   |/ \ |                                |
 | 710 Dorval Drive,  Suite 115    | \_/\_  | YOU can help wipe out COBOL in |
 | Oakville,  Ontario,  CANADA     |      \ |        our lifetime!!          |
 |  L6K 3V7   (416) 842-6873       |    \_/ |                                |
 |_________________________________|________|________________________________|

jerry@oliveb.UUCP (07/09/87)

In article <1987Jun16.170300.9918@parkridge.uucp> james@parkridge.UUCP (James Sheridan) writes:
>	Forgive me if this is a little bit naive, but what about having
>system-wide constants which tells the compilers (for whichever languages are
>available) what the sizes of the objects really are?  For example, cc would
>know that chars are w bits long, ints are x, shorts are y, and longs are z.

It is an interisting idea but I can see one problem.  Normally you load
your program with a previously compiled library.  The routines in the
library expect and return values to be of a specific size, not whatever
size you requested the compiler to use on your compilation.  And of
course the system calls have symilar expectations.

For example if you have some code that insists that longs must be only
16 bits the compiler should be able to handle this easily.  However if
your program uses lseek then the arguments are going to be a bit
confused.

I prefer having new types, defined by some method, that allow a more
specific type definition.  In this way you can use an "int16" when you
must have a 16 bit integer and use a (long) cast if you must pass that
to something requiring a long.  For less strengent storage you can use a
generic long defined to be what is efficient on that system.

The remaining problem is that the compiler may not support a type you
need.  Something like an int12 or an int64 might work on some systems
but isn't likely to be available elsewhere.

On a related issue; Is anyone familure with a C compiler where int was
not the same size as short or long?  I mean where short was 16 bits, int
was 32, and long was 64.
				Jerry Aguirre

jbn@glacier.UUCP (07/09/87)

Newsgroups: comp.unix.wizards
Subject: Re: Type size problems
Summary: 
Expires: 
References: <3659@spool.WISC.EDU> <743@geac.UUCP>
Sender: 
Reply-To: jbn@glacier.UUCP (John B. Nagle)
Followup-To: 
Distribution: 
Organization: Stanford University
Keywords: 

     I did some work in this area at one time, back when Ada came in four
colors, and proposed some approaches that are sound but have more of a
Pascal or Ada flavor than C programmers are used to.  My basic position was
similar to that taken by the IEEE floating point standards people:
the important thing is to get the right answer.  As it turns out, with
some work in the compiler, we can do integer arithmetic in a completely
portable way with no loss in performance.

     1.  Sizes belong to the program, not to the machine.  Thus,
         integer variables should be declared by range, by giving a lower
	 and upper bound for the value.  (In Pascal, this is called a 
	 "subrange", reflecting the assumption by Wirth that the type
	 "integer" is somehow big enough for all practical purposes.
	 This reflects the fact that he was using a Control Data 6600,
	 a machine with a 60-bit word, when he designed Pascal.)

	 For example, in Pascal, one writes 

		VAR x: 0..255;

     2.  Named types (such as "int" and "short") should be predefined but
	 not built in, and thus redefinable if needed.  Some standard
	 definitions such as "unsigned_byte" should be defined the same
	 way in all implementations.  But in general programmers should
	 use ranges.  (Of course, when declaring a range, expressions
	 evaluatable at compile time should be allowed in range bounds.
	 Pascal doesn't allow this, which results in great frustration.)

		VAR unsigned_short: 0..65535;

	 is a typical declaration in Pascal.  C should have equivalent
	 syntax.  It's silly that one has to guess what the type keywords
	 mean in terms of numeric value in each implementation yet can't
	 simply write the range when you want to.

	 Thus, if we had syntax in C for ranges, along the lines of

		range 0..65535 unsigned_short;

	 we could do in C what one can do in Pascal.

	 Given range declarations, one can create the "fundamental"
	 types of C.

		typedef range 0..255 unsigned_byte;
		typedef range -(2^15)..(2^15)-1 short;
		typedef range 0..(2^16)-1 unsigned_short;
		typedef range -(2^31)..(2^31)-1 long;
		typedef range 0..(2^31)-1 unsigned_long;

	 These should be in an include file, not built into the compiler.

     3.  Now here's the good part.  The compiler has to pick the size of
	 intermediate results.  (When we write "X = (A+B)+C;", "A+B"
	 generates an intermediate result.)  The compiler should always
	 pick a size for an intermediate result that cannot result in
	 overflow unless overflow of the result would occur.  This
	 strange rule does what you want; if you write "X = X+1", and
	 X has the range -32768..32767 (what we usually call "short"),
	 then there's no need to compute a long result for "X+1", even
	 though, if X=32767, overflow would occur, because overflow 
	 would also occur in the final result, which is an error.
	 (One would like to check for such errors; on VAXen, one can
	 enable such checking in the subroutine entry mask.  But nobody
	 does; I once built PCC with it enabled, and almost no UNIX program
	 would work.  More on this later.)  On the other hand, if one
	 writes "X = (A*B)/C;", and all variables are "short", the
	 term "A*B" will be computed as a "long" automatically, thus
	 avoiding the possibility of overflow.  (If you don't like that,
	 you would write "X = ((short)(A*B))/C;" and the compiler would
	 recognize this a statement that A*B should fit in a "short".)

     4.  Sometimes, but not often, one wants overflow, usually because
	 one is doing checksumming, hashing, or modular arithmetic.
	 The right way to do this is to provide modular arithmetic
	 operators.  One should be able to write

		X = MODPLUS(X,1,256);

	 and get "X+1 % 256".  The compiler must recognize as special
	 cases modular arithmetic with bounds of 2^n, and especially
	 2^(8*b), and do those efficiently.  The above example ought to
	 compile into a simple byte-wide add on machines that have the
	 instruction to do it.

     3.  Some intermediate results aren't computable on most machines.

		short X, A, B, C, D, E, F, G, H, I;
	 	X = (A * B * C * D * E * F * G * H) / I;

	 should generate an error message at compile time indicating that
	 the intermediate result won't fit in the machine.  If the
	 user really wants something like that evaluated (and recognize
	 that for most random values overflow would result in the above
	 expression) some casts or coercions will be necessary to tell
	 the compiler what the user has in mind.
	 Note that some programs that will compile on some machines
	 won't compile on others.  This is better than getting the wrong
	 answer.

     4.  Function declarations have to be available when calls are 
	 compiled, so the compiler can see what types it is supposed to
	 send.  Ada and Pascal work this way, and C++ moves strongly in
	 that direction.

     5.  There probably shouldn't be a predefined type "int" or "integer"
	 at all.  (I've been thinking of publishing the thinking shown
	 here under the title "Type integer considered harmful").

There's a general trend toward making integer arithmetic portable in LISP,
where unlimited length integers are often supported.  To the Common LISP
programmer, the width of the underlying machine's numeric unit is irrelevant.
The performance penalty for this generality in LISP is high.  But we can
achieve equivalent portability in the hard-compiled languages with some
effort.

This discussion probably should move to the C or C++ groups.

					John Nagle