james@parkridge.UUCP (06/17/87)
In article <12670@topaz.rutgers.edu> hedrick@topaz.rutgers.edu.UUCP writes: >Unfortunately in C (as most other languages) there is no distinction >between how you describe variables to be used within your program and >how you describe external objects. The result is that network code ... >pointers. But if we are to have any hope of writing portable network >code, there has to be some way to say that something is a 16 or 32 >bit object. Currently short and long are it. Anybody have a better >idea? The only alternative I can think of is to use long:16 and >long:32. Presumably that would continue to work if longs expanded. Forgive me if this is a little bit naive, but what about having system-wide constants which tells the compilers (for whichever languages are available) what the sizes of the objects really are? For example, cc would know that chars are w bits long, ints are x, shorts are y, and longs are z. All the user would have to do would be to set up defines (or whatever) that request a minimum and maximum size for the objects required and make sure that these constraints are followed strictly within his/her code. When the compiler went at it, it would see the requested sizes and make sure that it could satisfy them on the current machine while still following the K&R rules. If it couldn't, it would scream. For example.... #define MIN_CHAR 8 /* Minimum sizes required, max are optional */ #define MAX_CHAR 8 /* Compiler has free reign to shift about */ #define MIN_SHORT 8 /* sizes within the limits imposed here... */ #define MAX_SHORT 16 #define MIN_INT 8 #define MAX_INT 16 #define MIN_LONG 16 Anyone have any reasons why this sort of thing wouldn't work?? This is just off the top of my head, but it seems reasonable if you really want portable code and are prepared to put more work into it..... ___________________________________________________________________________ | | _____ | | | James R. Sheridan | | \ | ..utzoo!parkridge!pcssun!james | | | |__/ | | | Parkridge Computer Systems Inc. | |/ \ | | | 710 Dorval Drive, Suite 115 | \_/\_ | YOU can help wipe out COBOL in | | Oakville, Ontario, CANADA | \ | our lifetime!! | | L6K 3V7 (416) 842-6873 | \_/ | | |_________________________________|________|________________________________| -- ___________________________________________________________________________ | | _____ | | | James R. Sheridan | | \ | ..utzoo!parkridge!pcssun!james | | | |__/ | | | Parkridge Computer Systems Inc. | |/ \ | | | 710 Dorval Drive, Suite 115 | \_/\_ | YOU can help wipe out COBOL in | | Oakville, Ontario, CANADA | \ | our lifetime!! | | L6K 3V7 (416) 842-6873 | \_/ | | |_________________________________|________|________________________________|
james@parkridge.UUCP (06/17/87)
In article <12670@topaz.rutgers.edu> hedrick@topaz.rutgers.edu.UUCP writes: >Unfortunately in C (as most other languages) there is no distinction >between how you describe variables to be used within your program and >how you describe external objects. The result is that network code ... >pointers. But if we are to have any hope of writing portable network >code, there has to be some way to say that something is a 16 or 32 >bit object. Currently short and long are it. Anybody have a better >idea? The only alternative I can think of is to use long:16 and >long:32. Presumably that would continue to work if longs expanded. Forgive me if this is a little bit naive, but what about having system-wide constants which tells the compilers (for whichever languages are available) what the sizes of the objects really are? For example, cc would know that chars are w bits long, ints are x, shorts are y, and longs are z. All the user would have to do would be to set up defines (or whatever) that request a minimum and maximum size for the objects required and make sure that these constraints are followed strictly within his/her code. When the compiler went at it, it would see the requested sizes and make sure that it could satisfy them on the current machine while still following the K&R rules. If it couldn't, it would scream. For example.... #define MIN_CHAR 8 /* Minimum sizes required, max are optional */ #define MAX_CHAR 8 /* Compiler has free reign to shift about */ #define MIN_SHORT 8 /* sizes within the limits imposed here... */ #define MAX_SHORT 16 #define MIN_INT 8 #define MAX_INT 16 #define MIN_LONG 16 Anyone have any reasons why this sort of thing wouldn't work?? This is just off the top of my head, but it seems reasonable if you really want portable code and are prepared to put more work into it..... -- ___________________________________________________________________________ | | _____ | | | James R. Sheridan | | \ | ..utzoo!parkridge!pcssun!james | | | |__/ | | | Parkridge Computer Systems Inc. | |/ \ | | | 710 Dorval Drive, Suite 115 | \_/\_ | YOU can help wipe out COBOL in | | Oakville, Ontario, CANADA | \ | our lifetime!! | | L6K 3V7 (416) 842-6873 | \_/ | | |_________________________________|________|________________________________|
jerry@oliveb.UUCP (07/09/87)
In article <1987Jun16.170300.9918@parkridge.uucp> james@parkridge.UUCP (James Sheridan) writes: > Forgive me if this is a little bit naive, but what about having >system-wide constants which tells the compilers (for whichever languages are >available) what the sizes of the objects really are? For example, cc would >know that chars are w bits long, ints are x, shorts are y, and longs are z. It is an interisting idea but I can see one problem. Normally you load your program with a previously compiled library. The routines in the library expect and return values to be of a specific size, not whatever size you requested the compiler to use on your compilation. And of course the system calls have symilar expectations. For example if you have some code that insists that longs must be only 16 bits the compiler should be able to handle this easily. However if your program uses lseek then the arguments are going to be a bit confused. I prefer having new types, defined by some method, that allow a more specific type definition. In this way you can use an "int16" when you must have a 16 bit integer and use a (long) cast if you must pass that to something requiring a long. For less strengent storage you can use a generic long defined to be what is efficient on that system. The remaining problem is that the compiler may not support a type you need. Something like an int12 or an int64 might work on some systems but isn't likely to be available elsewhere. On a related issue; Is anyone familure with a C compiler where int was not the same size as short or long? I mean where short was 16 bits, int was 32, and long was 64. Jerry Aguirre
jbn@glacier.UUCP (07/09/87)
Newsgroups: comp.unix.wizards
Subject: Re: Type size problems
Summary:
Expires:
References: <3659@spool.WISC.EDU> <743@geac.UUCP>
Sender:
Reply-To: jbn@glacier.UUCP (John B. Nagle)
Followup-To:
Distribution:
Organization: Stanford University
Keywords:
I did some work in this area at one time, back when Ada came in four
colors, and proposed some approaches that are sound but have more of a
Pascal or Ada flavor than C programmers are used to. My basic position was
similar to that taken by the IEEE floating point standards people:
the important thing is to get the right answer. As it turns out, with
some work in the compiler, we can do integer arithmetic in a completely
portable way with no loss in performance.
1. Sizes belong to the program, not to the machine. Thus,
integer variables should be declared by range, by giving a lower
and upper bound for the value. (In Pascal, this is called a
"subrange", reflecting the assumption by Wirth that the type
"integer" is somehow big enough for all practical purposes.
This reflects the fact that he was using a Control Data 6600,
a machine with a 60-bit word, when he designed Pascal.)
For example, in Pascal, one writes
VAR x: 0..255;
2. Named types (such as "int" and "short") should be predefined but
not built in, and thus redefinable if needed. Some standard
definitions such as "unsigned_byte" should be defined the same
way in all implementations. But in general programmers should
use ranges. (Of course, when declaring a range, expressions
evaluatable at compile time should be allowed in range bounds.
Pascal doesn't allow this, which results in great frustration.)
VAR unsigned_short: 0..65535;
is a typical declaration in Pascal. C should have equivalent
syntax. It's silly that one has to guess what the type keywords
mean in terms of numeric value in each implementation yet can't
simply write the range when you want to.
Thus, if we had syntax in C for ranges, along the lines of
range 0..65535 unsigned_short;
we could do in C what one can do in Pascal.
Given range declarations, one can create the "fundamental"
types of C.
typedef range 0..255 unsigned_byte;
typedef range -(2^15)..(2^15)-1 short;
typedef range 0..(2^16)-1 unsigned_short;
typedef range -(2^31)..(2^31)-1 long;
typedef range 0..(2^31)-1 unsigned_long;
These should be in an include file, not built into the compiler.
3. Now here's the good part. The compiler has to pick the size of
intermediate results. (When we write "X = (A+B)+C;", "A+B"
generates an intermediate result.) The compiler should always
pick a size for an intermediate result that cannot result in
overflow unless overflow of the result would occur. This
strange rule does what you want; if you write "X = X+1", and
X has the range -32768..32767 (what we usually call "short"),
then there's no need to compute a long result for "X+1", even
though, if X=32767, overflow would occur, because overflow
would also occur in the final result, which is an error.
(One would like to check for such errors; on VAXen, one can
enable such checking in the subroutine entry mask. But nobody
does; I once built PCC with it enabled, and almost no UNIX program
would work. More on this later.) On the other hand, if one
writes "X = (A*B)/C;", and all variables are "short", the
term "A*B" will be computed as a "long" automatically, thus
avoiding the possibility of overflow. (If you don't like that,
you would write "X = ((short)(A*B))/C;" and the compiler would
recognize this a statement that A*B should fit in a "short".)
4. Sometimes, but not often, one wants overflow, usually because
one is doing checksumming, hashing, or modular arithmetic.
The right way to do this is to provide modular arithmetic
operators. One should be able to write
X = MODPLUS(X,1,256);
and get "X+1 % 256". The compiler must recognize as special
cases modular arithmetic with bounds of 2^n, and especially
2^(8*b), and do those efficiently. The above example ought to
compile into a simple byte-wide add on machines that have the
instruction to do it.
3. Some intermediate results aren't computable on most machines.
short X, A, B, C, D, E, F, G, H, I;
X = (A * B * C * D * E * F * G * H) / I;
should generate an error message at compile time indicating that
the intermediate result won't fit in the machine. If the
user really wants something like that evaluated (and recognize
that for most random values overflow would result in the above
expression) some casts or coercions will be necessary to tell
the compiler what the user has in mind.
Note that some programs that will compile on some machines
won't compile on others. This is better than getting the wrong
answer.
4. Function declarations have to be available when calls are
compiled, so the compiler can see what types it is supposed to
send. Ada and Pascal work this way, and C++ moves strongly in
that direction.
5. There probably shouldn't be a predefined type "int" or "integer"
at all. (I've been thinking of publishing the thinking shown
here under the title "Type integer considered harmful").
There's a general trend toward making integer arithmetic portable in LISP,
where unlimited length integers are often supported. To the Common LISP
programmer, the width of the underlying machine's numeric unit is irrelevant.
The performance penalty for this generality in LISP is high. But we can
achieve equivalent portability in the hard-compiled languages with some
effort.
This discussion probably should move to the C or C++ groups.
John Nagle