[comp.lang.c] 64 bit architectures and C/C++

shap@shasta.Stanford.EDU (shap) (04/27/91)

Several companies have announced or are known to be working on 64 bit
architectures. It seems to me that 64 bit architectures are going to
introduce some nontrivial problems with C and C++ code.

I want to start a discussion going on this topic.  Here are some seed
questions:

1. Do the C/C++ standards need to be extended to cover 64-bit
environments, or are they adequate as-is?

2. If a trade-off has to be made between compliance and ease of
porting, what's the better way to go?

3. If conformance to the standard is important, then the obvious
choices are

	short	16 bits
	int	32 bits
	long	64 bits
	void *	64 bits

How bad is it for sizeof(int) != sizeof(long). 

4. Would it be better not to have a 32-bit data type and to make int
be 64 bits?  If so, how would 32- and 64- bit programs interact?

Looking forward to a lively exchagne...

torek@elf.ee.lbl.gov (Chris Torek) (04/27/91)

In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>How bad is it for sizeof(int) != sizeof(long). 

This has been the case on PDP-11s for over 20 years.

It does cause problems---there is always software that makes invalid
assumptions---but typically long-vs-int problems, while rampant, are
also easily fixed.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

sarima@tdatirv.UUCP (Stanley Friesen) (04/28/91)

In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>Several companies have announced or are known to be working on 64 bit
>architectures. It seems to me that 64 bit architectures are going to
>introduce some nontrivial problems with C and C++ code.
 
>1. Do the C/C++ standards need to be extended to cover 64-bit
>environments, or are they adequate as-is?

They are adequate as is.  They make only minimum requirements for a
conforming implementation.  In particular, there is no reason why you
cannot have 64 bit long's (which is what I would do) or even 64 bit int's.
And you could easily have 64 and 128 bit floating point types (either
as 'float' and 'double' or as 'double' and 'long double' - both approaches
are standard conforming).

>2. If a trade-off has to be made between compliance and ease of
>porting, what's the better way to go?

In writing a new compiler from scratch (or even mostly from scratch) there
is no question, full ANSI compliance is absolutely necessary.
[An ANSI compiler may itself be less portable, but it allows the application
programers to write portable code more easily].

In porting an existing compiler, do whatever seems most practical.

>3. If conformance to the standard is important, then the obvious
>choices are
 
>	short	16 bits
>	int	32 bits
>	long	64 bits
>	void *	64 bits
 
OR:
	short	32 bits
	int	64 bits
	long	64 bits

OR:
	short	32 bits
	int	32 bits
	long	64 bits

Any one of the above may be the most appropriate depending on the
instruction set.  If there are no instructions for 16 bit quantities
then using 16 bit short's is a big loss.  And if it really is set up
as a hybrid 32/64 bit architecture, even the last may be useful.
[For instance, the Intel 80X86 series chips are hybrid 16/32 bit
architectures, so both 16 and 32 bit ints make sense].

>How bad is it for sizeof(int) != sizeof(long). 

Not particularly.  There are already millions of machines where this is
true - on PC class machines running MS-DOS sizeof(int) == sizeof(short)
for most existing compilers, (that is the sizes are 16, 16, 32).
[And on Bull mainframes, unless things have changed, all three are the
same size].

>4. Would it be better not to have a 32-bit data type and to make int
>be 64 bits?  If so, how would 32- and 64- bit programs interact?

Programs on different machines should not talk to each other in binary.
[See the long, acrimonious discussions about binary I/O right here].
And as long as you use either ASCII text or XDR representation for data
exchange, there is no problem.

Howver, I would be more likely to skip the 16 bit type than the 32 bit
type.  (Of course if the machine has a 16 bit add and not a 32 bit one ...).


In short the idea is that C should translate as cleanly as possible into
the most natural data types for the machine in question.  This is what the
ANSI committee had in mind.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

bhoughto@pima.intel.com (Blair P. Houghton) (04/28/91)

In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>It seems to me that 64 bit architectures are going to
>introduce some nontrivial problems with C and C++ code.

Nope.  They're trivial if you didn't assume 32-bit architecture,
which you shouldn't, since many computers still have 36, 16, 8,
etc.-bit architectures.

>I want to start a discussion going on this topic.  Here are some seed
>questions:

Here's some fertilizer (but most of you consider it that way, at
any time :-) ):

>1. Do the C/C++ standards need to be extended to cover 64-bit
>environments, or are they adequate as-is?

The C standard allows all sorts of data widths, and specifies
a scad of constants (#defines, in <limits.h> to let you use
these machine-specific numbers in your code, anonymously.

>2. If a trade-off has to be made between compliance and ease of
>porting, what's the better way to go?

If you're compliant, you're portable.

>3. If conformance to the standard is important, then the obvious
>choices are
>
>	short	16 bits
>	int	32 bits
>	long	64 bits
>	void *	64 bits

The suggested choices are:

	short	<the shortest integer the user should handle; >= 8 bits> 
	int	<the natural width of integer data on the cpu; >= a short>
	long	<the longest integer the user should handle; >= an int>
	void *  <long enough to specify any location legally addressable>

There's no reason for an int to be less than the full
register-width, and no reason for an address to be limited
to the register width.

An interesting side-effect of using the constants is that
you never need to know the sizes of these things on your
own machine; i.e., use CHAR_BIT (the number of bits in a char)
and `sizeof int' (the number of chars in an int) and you'll
never need to know how many bits an int contains.

>How bad is it for sizeof(int) != sizeof(long). 

It's only bad if you assume it's not true. (I confess:  I peeked.
I saw Chris' answer, and I'm not going to disagree.)

>4. Would it be better not to have a 32-bit data type and to make int
>be 64 bits?  If so, how would 32- and 64- bit programs interact?

Poorly, if at all.  Data transmission among architechures
with different bus sizes is a hairy issue of much aspirin.
The only portable method is to store and transmit the data
in some width-independent form, like morse-code or a text
format (yes, ascii is a 7 or 8 bits wide, but it's a
_common_ form of data-width hack, and if all else fails,
you can hire people to read and type it into your
machine).

>Looking forward to a lively exchagne...

				--Blair
				  "Did anyone NOT bring potato salad?"

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (04/29/91)

shap@shasta.Stanford.EDU (shap) writes:

>2. If a trade-off has to be made between compliance and ease of
>porting, what's the better way to go?

User selectable.

>3. If conformance to the standard is important, then the obvious
>choices are

>	short	16 bits
>	int	32 bits
>	long	64 bits
>	void *	64 bits

That depends on the natural address size of the machine.  If the
machine uses 32 bit addresses, then (void *) should be 32 bits.
I would not want my address arrays taking up more memory than is
needed.

Is it really necessary that sizeof(void *) == sizeof(long)?

>How bad is it for sizeof(int) != sizeof(long). 

Would not bother me as long as sizeof(int) <= sizeof(long)

>4. Would it be better not to have a 32-bit data type and to make int
>be 64 bits?  If so, how would 32- and 64- bit programs interact?

Again it would depend on the machine.  If the machine has both 32 bit
and 64 bit operations, then do include them.  If a 32 bit operation
is unnatural to the machine, then don't.  If it has 16 bit operations
then that makes sense for short.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

jac@gandalf.llnl.gov (James A. Crotinger) (04/29/91)

  Anyone want to comment on experiences with Crays? I believe the C
compilers have sizeof(int) = sizeof(short) = sizeof(long) == 46 or 64
bits, depending on a compile time flag (Crays can do 46 bit integer
arithmetic using the vectorizing floating point processors, so that is
the default). 

  Binary communications between Crays and other computers is something
I haven't done, mostly because Cray doesn't support IEEE floating point.

  Jim



--
-----------------------------------------------------------------------------
James A. Crotinger     Lawrence Livermore Natl Lab // The above views 
jac@moonshine.llnl.gov P.O. Box 808;  L-630    \\ // are mine and are not 
(415) 422-0259         Livermore CA  94550      \\/ necessarily those of LLNL

campbell@redsox.bsw.com (Larry Campbell) (04/29/91)

In article <1991Apr29.050715.22968@ux1.cso.uiuc.edu> phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:
->3. If conformance to the standard is important, then the obvious
->choices are
-
->	short	16 bits
->	int	32 bits
->	long	64 bits
->	void *	64 bits
-
-That depends on the natural address size of the machine.  If the
-machine uses 32 bit addresses, then (void *) should be 32 bits.
-I would not want my address arrays taking up more memory than is
-needed.
-
-Is it really necessary that sizeof(void *) == sizeof(long)?

Of course not.

We're currently porting a largish (150K lines) program to a machine
on which:

	short	(dunno, not at work now so I can't check)
	int	32 bits
	long	32 bits
	void *	128 bits

Thank *god* it has a fully-compliant ANSI compiler.

For extra credit:  can you guess what machine this is?
-- 
Larry Campbell             The Boston Software Works, Inc., 120 Fulton Street
campbell@redsox.bsw.com    Boston, Massachusetts 02109 (USA)

mvm@jedi.harris-atd.com (Matt Mahoney) (04/29/91)

When I need to specify bits, I'm usually forced to make the 
following assumptions:

	char	8 bits
	short	16 bits
	long	32 bits

since this is true on most machines.  Anything else would probably break
a lot of code.

-------------------------------
Matt Mahoney, mvm@epg.harris.com
#include <disclaimer.h>

robertk@lotatg.lotus.com (Robert Krajewski) (04/29/91)

Actually, there's one thing that people didn't mention -- the
feasibility of bigger lightweight objects. I'd assume that any
processor that advertised a 64-bit archtitecture would be able to
efficiently move around 64 bits at a time, so it would be very cheap
to move 8-byte (excuse me, octet) objects around by copying.

turk@Apple.COM (Ken "Turk" Turkowski) (04/30/91)

shap@shasta.Stanford.EDU (shap) writes:

>3. If conformance to the standard is important, then the obvious
>choices are

>	short	16 bits
>	int	32 bits
>	long	64 bits
>	void *	64 bits

>4. Would it be better not to have a 32-bit data type and to make int
>be 64 bits?  If so, how would 32- and 64- bit programs interact?

It is necessary to have 8, 16, and 32-bit data types, in order to be able
to read data from files.  I would suggest NOT specifying a size for the int
data type; this is supposed to be the most efficient integral data type
for a particular machine and compiler.

A lot of programs rely on the fact that nearly all C implementations
have a 32-bit long int.

I would suggest:

short	16 bits
long	32 bits
long long	64 bits
int	UNSPECIFIED
void *	UNSPECIFIED

This is patterned after ANSI floating-point extensions to accommodate
an extended format (i.e. "long double").

How about "double long", because it really is two longs?  (Donning flame
retardent suit).

What then would 128-bit ingeter be?  long long long?  double double long?
long double long?  quadruple long?  How about the Fortran method: int*8?

Another proposal would be to invent a new word, like "big", "large",
"whopper", "humongous", "giant", "extra", "super", "grand", "huge",
"jumbo", "broad", "vast", "wide", "fat", "hefty", etc.

Whatever chaice is made, there should be ready extensions to 128 and 256
bit integers, as well as 128 and 256-bit floating point numbers.

P.S. By the way is there an analagous word for floating-point numbers as
int does for integers?
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: TURK
UUCP: sun!apple!turk

john@sco.COM (John R. MacMillan) (04/30/91)

shap <shap@shasta.Stanford.EDU> writes:
|Several companies have announced or are known to be working on 64 bit
|architectures. It seems to me that 64 bit architectures are going to
|introduce some nontrivial problems with C and C++ code.

In a past life I did a fair amount of work with C on a 64 bit
architecture, the C/VE compiler on NOS/VE.  C/VE was a pre-ANSI
compiler, but many of the comments I think still apply.

C/VE had 64 bit ints and longs, 32 bit shorts, 8 bit chars, and 48 bit
pointers (as an added bonus, the null pointer was not all bits zero,
but that's another headache entirely; ask me how many times I've
wanted to strangle a programmer who used bzero() to clear structures
that have pointers in them).

|I want to start a discussion going on this topic.  Here are some seed
|questions:
|
|1. Do the C/C++ standards need to be extended to cover 64-bit
|environments, or are they adequate as-is?

The C standard certainly is; it's obvious a lot of effort went into
making sure it would be.

|2. If a trade-off has to be made between compliance and ease of
|porting, what's the better way to go?

I don't think there's any reason to trade off compliance with
standards; if you want to trade off ease of porting versus exploiting
the full power of the architecture that's another question.  The
answer is going to be different for different people.

If you make the only 64 bit data type be ``long long'' or some such,
it will make life much easier on porters, but most of the things you
port then won't take advantage of having 64 bit data types...

|3. If conformance to the standard is important, then the obvious
|choices are
|
|	short	16 bits
|	int	32 bits
|	long	64 bits
|	void *	64 bits

This isn't really a conformance issue.  The idea is to make the
machine's natural types fit the C types well.  However, if the
architecture can support all these sizes easily, then for ease of
porting it would be nice to have all of the common sizes available.
Many (often poorly written) C programs depend on, say, short being 16
bits, and having a 16 bit data type is the easiest way to port such
programs.

One problem with 32 bit ints and 64 bit pointers is that a lot of
(bad) code assumes you can put a pointer into an int, and vice versa.

As an aside, using things like ``typedef int int32'', is not always
the answer, especially if you don't tell the porter (or are
inconsistent about) whether this should be an integer data type that is
_exactly_ 32 bits or _at least_ 32 bits.

|How bad is it for sizeof(int) != sizeof(long).

The C/VE compiler had sizeof(int) == sizeof(long) so I can't comment
on that one in particular, but...

|4. Would it be better not to have a 32-bit data type and to make int
|be 64 bits?  If so, how would 32- and 64- bit programs interact?

...there is a lot of badly written code out there, and no matter what
you do, you'll break somebody's bogus assumptions.  In particular a
lot of code makes assumptions about pointer sizes, whether they'll fit
in ints, and whether you can treat them like ints.

Expect porting to 64 bit architectures to be work, because it is.

wmm@world.std.com (William M Miller) (04/30/91)

bhoughto@pima.intel.com (Blair P. Houghton) writes:
> The suggested choices are:
>
>         short   <the shortest integer the user should handle; >= 8 bits>

Actually, ANSI requires at least 16 bits for shorts (see SHRT_MIN and
SHRT_MAX in <limits.h>, X3.159-1989 2.2.4.2.1).

-- William M. Miller, Glockenspiel, Ltd.
   wmm@world.std.com

gwyn@smoke.brl.mil (Doug Gwyn) (04/30/91)

In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>1. Do the C/C++ standards need to be extended to cover 64-bit
>environments, or are they adequate as-is?

This question presupposes something that is not true, namely that
64-bit environments differ from current environments.  In fact,
I've been using 64-bit C environments for years, in addition to
16-bit and 32-bit ones, with occasional dabbling in 60-bit environments.

The C standard does not presuppose any particular architecture.

>2. If a trade-off has to be made between compliance and ease of
>porting, what's the better way to go?

There is no excuse for a new C implementation to not conform to the
C standard.

Note that the standard allows the C implementor much flexibility
when it comes to architecturally-determined choices.

>3. If conformance to the standard is important, then the obvious
>choices are
>	short	16 bits
>	int	32 bits
>	long	64 bits
>	void *	64 bits

(You seem to have also assumed that a char is 8 bits.)
There is nothing particularly "obvious" about these choices;
I could readily imagine many other choices that would be both
standard conforming and useful.

>How bad is it for sizeof(int) != sizeof(long). 

There should not be any applications that depend on int and long
having the same size.

>4. Would it be better not to have a 32-bit data type and to make int
>be 64 bits?  If so, how would 32- and 64- bit programs interact?

I don't know what you mean by a "32-bit program".

>Looking forward to a lively exchagne...

I don't see what there is to discuss.  The C standard specifies
minimum ranges for the basic types, and anything beyond that is
up to the implementor to decide, taking into account his customers'
needs.

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/01/91)

mvm@jedi.harris-atd.com (Matt Mahoney) writes:

>When I need to specify bits, I'm usually forced to make the 
>following assumptions:

>	char	8 bits
>	short	16 bits
>	long	32 bits

>since this is true on most machines.  Anything else would probably break
>a lot of code.

What would break if you did:

	char	8 bits
	short	16 bits
	int	32 bits
	long	64 bits

where any pointer or pointer difference would fit in 32 bits?
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/01/91)

john@sco.COM (John R. MacMillan) writes:

>One problem with 32 bit ints and 64 bit pointers is that a lot of
>(bad) code assumes you can put a pointer into an int, and vice versa.

>...there is a lot of badly written code out there, and no matter what
>you do, you'll break somebody's bogus assumptions.  In particular a
>lot of code makes assumptions about pointer sizes, whether they'll fit
>in ints, and whether you can treat them like ints.

For how long should we keep porting code, especially BAD CODE?  This sounds
a lot like school systems that keep moving failing students up each year
and we know what that results in.

IMHO, no code older than 8 years should be permitted to be ported and if
it is found to be "bad" code then it must have been written more than 8
years ago.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

peter@llama.trl.OZ.AU (Peter Richardson - NSSS) (05/01/91)

In article <4068@inews.intel.com>, bhoughto@pima.intel.com (Blair P.
Houghton) writes:
> In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
> >It seems to me that 64 bit architectures are going to
> >introduce some nontrivial problems with C and C++ code.
> 
> Nope.  They're trivial if you didn't assume 32-bit architecture,
> which you shouldn't, since many computers still have 36, 16, 8,
> etc.-bit architectures.
> 
> >I want to start a discussion going on this topic.  Here are some seed
> >questions:

Hmmm. As I understand it. if you want to write truly portable code, you
should never make assumptions about sizeof any integral types. We have
a local header file on each machine type defining Byte, DoubleByte etc.
For example, on sun4:

typedef unsigned char Byte;             // always a single byte
typedef unsigned short DoubleByte;      // always two bytes
typedef unsigned long QuadByte;         // always four bytes

If you want to use an int, use an int. If you want to use a 16 bit
quantity, use a DoubleByte. To port to new machines, just change the
header file. Purists may prefer "Octet" to "Byte".

It is up to the platform/compiler implementation to determine the
appropriate sizeof integral types. It should not be part of the
language.

> 
> Poorly, if at all.  Data transmission among architechures
> with different bus sizes is a hairy issue of much aspirin.
> The only portable method is to store and transmit the data
> in some width-independent form, like morse-code or a text
> format (yes, ascii is a 7 or 8 bits wide, but it's a
> _common_ form of data-width hack, and if all else fails,
> you can hire people to read and type it into your
> machine).

There is an international standard for doing this, called Abstract
Syntax Notation One (ASN.1), defined by ISO. It is based on the CCITT
standards X.208 and X.209 (I think). It is more powerful than either of
the proprietary standards XDR or NDR. Compilers are used to translate
ASN.1 data descriptions into C/C++ structures, and produce
encoder/decoders.  

---
Peter Richardson 		         Phone: +61 3 541-6342
Telecom Research Laboratories 	           Fax: +61 3 544-2362
					 Snail: GPO Box 249, Clayton, 3168
                                                Victoria, Australia
Internet: p.richardson@trl.oz.au
X400: g=peter s=richardson ou=trl o=telecom prmd=telecom006 admd=telememo c=au

bhoughto@pima.intel.com (Blair P. Houghton) (05/01/91)

In article <1991Apr30.140217.7065@world.std.com> wmm@world.std.com (William M Miller) writes:
>bhoughto@pima.intel.com (Blair P. Houghton) writes:
>>         short   <the shortest integer the user should handle; >= 8 bits>
>Actually, ANSI requires at least 16 bits for shorts (see SHRT_MIN and
>SHRT_MAX in <limits.h>, X3.159-1989 2.2.4.2.1).

I had my brain packed-BCD mode that day, apparently :-/...
The minimum sizes for the four integer types are:

	char	8 bits
	short	16
	int	16
	long	32

Other than that, one need only ensure that short, int, and
long are multiples of the size of a char, e.g., 9, 27, 36, 36.

				--Blair
				  "Hike!"

dlw@odi.com (Dan Weinreb) (05/01/91)

In article <1991May1.012242.26211@ux1.cso.uiuc.edu> phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:

   From: phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN)
   Date: 1 May 91 01:22:42 GMT
   References: <168@shasta.Stanford.EDU> <1991Apr29.211937.10865@sco.COM>
   Organization: University of Illinois at Urbana

   IMHO, no code older than 8 years should be permitted to be ported 

I see from the Organization field in your mail header that you're from
a university.

jerry@talos.npri.com (Jerry Gitomer) (05/01/91)

mvm@jedi.harris-atd.com (Matt Mahoney) writes:

:When I need to specify bits, I'm usually forced to make the 
:following assumptions:

:	char	8 bits
:	short	16 bits
:	long	32 bits

:since this is true on most machines.  Anything else would probably break
:a lot of code.

	We are caught between the rock (wanting to take *full* advantage of
	the new wider register and memory data path machines 64 bits today,
	128 tomorrow, and 256 the day after tomorrow) and the hard place 
	(wanting to preserve code that was handcrafted to the
	idiosyncracies of prior generation hardware).  Our choices are
	simple -- throw the performance of the new machines out the window
	or spend the time and money required to fix up the code so that it
	complies with the standard.  (Sure, standards aren't cast in
	concrete, but their life exptancy exceeds that of today's typical
	computer system).

	IMHO (now isn't that an arrogant phrase? :-) ) it is better to fix
	up the offending programs now than to do it later.  I say this
	because I presume that salaries will continue to increase, which
	will make it more expensive to fix things up later, and because
	staff turnover leads to a decrease over time in knowledge of the
	offending programs.

-- 
Jerry Gitomer at National Political Resources Inc, Alexandria, VA USA
I am apolitical, have no resources, and speak only for myself.
Ma Bell (703)683-9090  (UUCP:  ...uunet!uupsi!npri6!jerry )

rli@buster.stafford.tx.us (Buster Irby) (05/02/91)

turk@Apple.COM (Ken "Turk" Turkowski) writes:

>It is necessary to have 8, 16, and 32-bit data types, in order to be able
>to read data from files.  I would suggest NOT specifying a size for the int
>data type; this is supposed to be the most efficient integral data type
>for a particular machine and compiler.

You assume a lot about the data in the file.  Is it stored in a specific
processor format (ala Intel vs Motorolla)?  My experience has been that
binary data is not portable anyway.

gwyn@smoke.brl.mil (Doug Gwyn) (05/02/91)

In article <13229@goofy.Apple.COM> turk@Apple.COM (Ken "Turk" Turkowski) writes:
>I would suggest:
>short	16 bits
>long	32 bits
>long long	64 bits
>int	UNSPECIFIED
>void *	UNSPECIFIED

What on Earth do you mean by "UNSPECIFIED"?  An implementation MUST
make a definite choice here.  The C language standard already contains
all the requisite specifications.

Note that a standard-conforming implementation is obliged to diagnose
use of any construct such as "long long".  Therefore that is a stupid
extension.  I guess I shouldn't be surprised, however, given that the
APW C math library functions were declared as returning type "extended"
rather than the type "double" required by the C standard.  It didn't
dawn on them, apparently, that "double" would have best been
implemented as SANE extended format in the first place.

daves@ex.heurikon.com (Dave Scidmore) (05/02/91)

In article <6157@trantor.harris-atd.com> mvm@jedi.UUCP (Matt Mahoney) writes:
>When I need to specify bits, I'm usually forced to make the 
>following assumptions:
>
>	char	8 bits
>	short	16 bits
>	long	32 bits
>
>since this is true on most machines.  Anything else would probably break
>a lot of code.

I'm supprised nobody has mentioned that the real solution to this kind
of portability problem is for the original programmer to use the
definitions in "types.h" that tell you how big chars, shorts, ints,
and longs are. I know that a lot of existing code does not take advantage
of the ability to use typdefs or #defines to alter the size of key
variables or adjust for numbers of bits for each, but doing so would
help prevent the kinds of portability problems mentioned. I always urge
people when writing their own code to be aware of size dependent code
and either use the existing "types.h", or make their own and use it to
make such code more portable. This won't help you when porting someone
elses machine dependant (and dare I say poorly written) code, but the next
guy who has to port your code will have an easier time of it.
--
Dave Scidmore, Heurikon Corp.
dave.scidmore@heurikon.com

daves@ex.heurikon.com (Dave Scidmore) (05/02/91)

turk@Apple.COM (Ken "Turk" Turkowski) writes:

>It is necessary to have 8, 16, and 32-bit data types, in order to be able
>to read data from files.

Bad practice!!!! This works fine if the one reading the data is always the
same as the one writing it, but you are implying that these data sizes
are important for having a machine read files written by another machine,
then storing structures as binary images can result in severe problems. Byte
ordering is a more fundamental problem than the size of types when trying to
read and write binary images.

The world of microcomputers is divided into two camps: those who store the least
significant byte of a 16 or 32 bit quantity in the lowest memory location
(as in Intel processors), and those which store the most significant byte in
the lowest memory location (as in Motorola processors). Given the value
0x12345678 each stores 32 bit quantities as follows:

Memory address LSB
0	1	2	3
0x78	0x56	0x34	0x12	LSB in lowest address (Intel convention)
0x12	0x34	0x56	0x78	MSB in lowest address (Motorola convention)

From this you can see that if a big-endian processor writes a 32 bit int
into memory a little endian processor will read it back backwards. The end
result is the need to swap all bytes within 16 and 32 bit quantites. When
reading structures from a file, this can only be done if you know the size
of each component of the structure and swap it after reading. In general
this is usualy sufficient reason not to store binary images of data in files
unless you can assure that the machine reading the values will always follow
the same size and byte ordering convention.

>I would suggest NOT specifying a size for the int
>data type; this is supposed to be the most efficient integral data type
>for a particular machine and compiler.

I agree.

>A lot of programs rely on the fact that nearly all C implementations
>have a 32-bit long int.

The precident for non-32 bit ints predates the microprocessor and anyone who
writes a supposedly "portable" program assuming long ints are 32 bits is
creating a complex and difficult mess for the person who has to port the
code to untangle.

>I would suggest:
>
>short	16 bits
>long	32 bits
>long long	64 bits
>int	UNSPECIFIED
>void *	UNSPECIFIED

I would suggest not making assumptions about the size of built in types
when writing portable code.
--
Dave Scidmore, Heurikon Corp.
dave.scidmore@heurikon.com

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/02/91)

dlw@odi.com (Dan Weinreb) writes:

>In article <1991May1.012242.26211@ux1.cso.uiuc.edu> phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:

>   From: phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN)
>   Date: 1 May 91 01:22:42 GMT
>   References: <168@shasta.Stanford.EDU> <1991Apr29.211937.10865@sco.COM>
>   Organization: University of Illinois at Urbana

>   IMHO, no code older than 8 years should be permitted to be ported 

>I see from the Organization field in your mail header that you're from
>a university.

I see from the domain in your return address you are from a commercial
organization.

SO............
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

john@sco.COM (John R. MacMillan) (05/02/91)

|>4. Would it be better not to have a 32-bit data type and to make int
|>be 64 bits?  If so, how would 32- and 64- bit programs interact?
|
|It is necessary to have 8, 16, and 32-bit data types, in order to be able
|to read data from files.

It's not necessary, but it does make it easier.

|I would suggest NOT specifying a size for the int
|data type; this is supposed to be the most efficient integral data type
|for a particular machine and compiler.
|
|[...]
|
|short	16 bits
|long	32 bits
|long long	64 bits
|int	UNSPECIFIED
|void *	UNSPECIFIED

Problem with this is that I don't think sizeof(long) is allowed to be
less than sizeof(int) which would constrain your ints to 32 bits.

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/02/91)

jerry@talos.npri.com (Jerry Gitomer) writes:

>	IMHO (now isn't that an arrogant phrase? :-) ) it is better to fix
>	up the offending programs now than to do it later.  I say this
>	because I presume that salaries will continue to increase, which
>	will make it more expensive to fix things up later, and because
>	staff turnover leads to a decrease over time in knowledge of the
>	offending programs.

Also, what about staff MORALE?  I don't know about a lot of other programmers,
but I for one would be much happier at the very least cleaning up old code and
making it work right (or better yet rewriting it from scratch the way it SHOULD
have been done in the first place) than perpetuationg bad designs of the past
which translate into inefficiencies of the future.

But if you are interested in getting things converted quickly, then just make
TWO models of the compiler.  You then assign a special flag name to make the
compiler work in such a way that it will avoid breaking old code.  Programs
written AFTER the compiler is ready should be required to compile WITHOUT that
flag.  You could call the flag "-badcode".  I think that might be a fair
compromise between getting all the old bad code to work now under the new
machine, while still promoting better programming practices for the present
and future (and flagging examples of what NOT to do).
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

john@sco.COM (John R. MacMillan) (05/02/91)

|For how long should we keep porting code, especially BAD CODE?  This sounds
|a lot like school systems that keep moving failing students up each year
|and we know what that results in.

Whether or not it's a good idea, people will keep porting bad code as
long as other people are willing to pay for it.

Users are really buying Spiffo 6.3.  They don't care how it's written;
they just like it.  So Monster Hardware, in an effort to boost sales,
wants to be able to sell their boxes as a platform for running Spiffo
6.3.  They don't care how it's written, they just want it.  The core
of Spiffo 6.3 is from Spiffo 1.0, and was written 10 years ago by 5
programmers who are now either VPs or no longer there, and who never
considered it would have to run on an MH1800 where the chars are 11
bits and ints are 33.

It happens.  Honest.  I suspect many of us know a Spiffo or two.

|IMHO, no code older than 8 years should be permitted to be ported and if
|it is found to be "bad" code then it must have been written more than 8
|years ago.

The first part simply won't happen if there's demand, and I'm not sure
I understand the second part.

jfc@athena.mit.edu (John F Carr) (05/02/91)

In article <16023@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>Note that a standard-conforming implementation is obliged to diagnose
>use of any construct such as "long long".  Therefore that is a stupid
>extension.

I disagree.  I want a compiler that supports ANSI features, but I would
rather have "long long" cause the compiler to generate 64 bit code than
cause the compiler to say "error: invalid type".  I think the C standard is
valuable because it is a list of what is valid C, not because it also says
what is not valid C.

--
    John Carr (jfc@athena.mit.edu)

henry@zoo.toronto.edu (Henry Spencer) (05/02/91)

In article <1991May2.033545.15051@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
>rather have "long long" cause the compiler to generate 64 bit code than
>cause the compiler to say "error: invalid type".  I think the C standard is
>valuable because it is a list of what is valid C, not because it also says
>what is not valid C.

The C standard says both.  However, why do you assume that the compiler
must complain *or* generate 64-bit code?  ANSI C does not prevent it from
doing both.  The only thing the standard requires is that violations of its
constraints must draw at least one complaint.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

cadsi@ccad.uiowa.edu (CADSI) (05/02/91)

From article <1991May01.172042.5214@buster.stafford.tx.us>, by rli@buster.stafford.tx.us (Buster Irby):
> turk@Apple.COM (Ken "Turk" Turkowski) writes:
> 
>>It is necessary to have 8, 16, and 32-bit data types, in order to be able
>>to read data from files.  I would suggest NOT specifying a size for the int
>>data type; this is supposed to be the most efficient integral data type
>>for a particular machine and compiler.
> 
> You assume a lot about the data in the file.  Is it stored in a specific
> processor format (ala Intel vs Motorolla)?  My experience has been that
> binary data is not portable anyway.

Binary isn't in general portable.  However, using proper typedefs in
a class one can move binary read/write classes from box to box.  I think
the solution the the whole issue of sizeof(whatever) is to simply assume
nothing.  Always typedef.  It isn't that difficult, and code I've done this
runs on things ranging from DOS machines to CRAY's COS (and UNICOS) without
code (barring the typedef header files) changes.

|----------------------------------------------------------------------------|
|Tom Hite					|  The views expressed by me |
|Manager, Product development			|  are mine, not necessarily |
|CADSI (Computer Aided Design Software Inc.	|  the views of CADSI.       |
|----------------------------------------------------------------------------|

steve@taumet.com (Stephen Clamage) (05/02/91)

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:

>But if you are interested in getting things converted quickly, then just make
>TWO models of the compiler.  You then assign a special flag name to make the
>compiler work in such a way that it will avoid breaking old code.

Some compilers already do this.  For example, our compilers (available from
Oregon Software) have a "compatibility" switch which allows compilation of
old-style code, including old-style preprocessing.  In this mode, ANSI
features (including ANSI preprocessing and function prototypes) are still
available, allowing gradual migration of programs from old-style to ANSI C.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

bright@nazgul.UUCP (Walter Bright) (05/03/91)

In article <12563@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
/In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
/>How bad is it for sizeof(int) != sizeof(long). 
/It does cause problems---there is always software that makes invalid
/assumptions---but typically long-vs-int problems, while rampant, are
/also easily fixed.

The most aggravating problem we have is it seems we (Zortech) are the only
compiler for which:
	char
	signed char
	unsigned char
are all distinct types! For example,
	char *p;
	signed char *ps;
	unsigned char *pu;
	p = pu;			/* syntax error */
	p = ps;			/* syntax error */
It seems we are the only compiler that flags these as errors.
A related example is:
	int i;
	short *ps;
	*ps = &i;		/* syntax error, for 16 bit compilers too */
I think a lot of people are in for a surprise when they port to 32 bit
compilers... :-)

gwyn@smoke.brl.mil (Doug Gwyn) (05/03/91)

In article <1991May2.033545.15051@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
-In article <16023@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
->Note that a standard-conforming implementation is obliged to diagnose
->use of any construct such as "long long".  Therefore that is a stupid
->extension.
-I disagree.  I want a compiler that supports ANSI features, but I would
-rather have "long long" cause the compiler to generate 64 bit code than
-cause the compiler to say "error: invalid type".  I think the C standard is
-valuable because it is a list of what is valid C, not because it also says
-what is not valid C.

I think you missed the point.  There are numerous CONFORMING ways in
which additional integer types can be added to C.  "long long" is NOT
one of these, and a standard-conforming implementation is OBLIGED to
diagnose the use of "long long", which violates the Constraints of
X3.159-1989 section 3.5.2.  Therefore "long long" is not a wise way
to make such an extension.

gwyn@smoke.brl.mil (Doug Gwyn) (05/03/91)

In article <1991May01.222112.13130@sco.COM> john@sco.COM (John R. MacMillan) writes:
>|It is necessary to have 8, 16, and 32-bit data types, in order to be able
>|to read data from files.
>It's not necessary, but it does make it easier.

Not even that.  Assuming that for some unknown reason you're faced with
reading a binary file that originated on some other system, there is a
fair chance that it used a "big endian" architecture while your system
is "little endian" or vice-versa.

Binary data transportability is a much thornier issue than most people
realize.

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/03/91)

jfc@athena.mit.edu (John F Carr) writes:

>I disagree.  I want a compiler that supports ANSI features, but I would
>rather have "long long" cause the compiler to generate 64 bit code than
>cause the compiler to say "error: invalid type".  I think the C standard is
>valuable because it is a list of what is valid C, not because it also says
>what is not valid C.

I see nothing wrong with this.  You have ANSI C and you have extensions.

Of course YOUR extensions and MY extensions may not be the same, and may
even be mutually exclusive.  For each of us to ensure our code will compile
on the other's compiler, we can restrict ourselves to ANSI C.

On the other hand if we can get together and make our extensions the same,
we widen the domain in which our non-standard code that takes advantage of
these powerful features can be used.

When I am writing ANSI C, it does help to have something jump in there and
complain when I go beyond the standard.  I believe in GCC this is "-pedantic"
or something like that.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

jfc@athena.mit.edu (John F Carr) (05/03/91)

In article <1991May2.041911.14489@zoo.toronto.edu>
	henry@zoo.toronto.edu (Henry Spencer) writes:
>However, why do you assume that the compiler
>must complain *or* generate 64-bit code?  ANSI C does not prevent it from
>doing both.  The only thing the standard requires is that violations of its
>constraints must draw at least one complaint.

I know that diagnostics are not required to be fatal errors, but I would be
annoyed to get a warning every time I compiled code that used a nonstandard
extension.  I think the gcc solution is a good one: support ANSI features by
default, but only print warnings for use of extensions when the user asks
for them.

--
    John Carr (jfc@athena.mit.edu)

shap@shasta.Stanford.EDU (shap) (05/03/91)

In article <13229@goofy.Apple.COM> turk@Apple.COM (Ken "Turk" Turkowski) writes:
>I would suggest:
>
>short	16 bits
>long	32 bits
>long long	64 bits
>int	UNSPECIFIED
>void *	UNSPECIFIED
>
>This is patterned after ANSI floating-point extensions to accommodate
>an extended format (i.e. "long double").
>
>Another proposal would be to invent a new word, like "big", "large",
>"whopper", "humongous", "giant", "extra", "super", "grand", "huge",
>"jumbo", "broad", "vast", "wide", "fat", "hefty", etc.
>
>Whatever chaice is made, there should be ready extensions to 128 and 256
>bit integers, as well as 128 and 256-bit floating point numbers.

Actually, that's what you did.  The 'long long' data type does not
conform to the ANSI standard.

The advantage to the approach
	short		16
	int		32
	long		32
	long long	64

Is that fewer datatypes change size (this approach leaves only
pointers changing), and the code could conceivably have the same
integer sizes in 32- and 64-bit mode.

But isn't ANSI conformance a requirement?

shap@shasta.Stanford.EDU (shap) (05/03/91)

In article <4068@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
>In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>
>>2. If a trade-off has to be made between compliance and ease of
>>porting, what's the better way to go?
>
>If you're compliant, you're portable.

While I happen to agree with this sentiment, there is an argument that X
hundred million lines of C code can't be wrong.  The problem with
theology is that it's not commercially viable.

Reactions?

Jonathan

shap@shasta.Stanford.EDU (shap) (05/03/91)

In article <1991May2.033545.15051@athena.mit.edu> jfc@athena.mit.edu (John F Carr) writes:
>In article <16023@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>>Note that a standard-conforming implementation is obliged to diagnose
>>use of any construct such as "long long"...
>
>I disagree.  I want a compiler that supports ANSI features, but I would
>rather have "long long" cause the compiler to generate 64 bit code than
>cause the compiler to say "error: invalid type".  I think the C standard is
>valuable because it is a list of what is valid C, not because it also says
>what is not valid C.

Fortunately, you aren't the standard.

The standard is very precise.  It does not require that the use of an
extension be an error.  It does REQUIRE that the compiler issue a
diagnostic.  Something like

	file.c: 64: Thanks for using long long!

Would conform.

Credit for the example to Dave Prosser of AT&T.


Jonathan
>
>--
>    John Carr (jfc@athena.mit.edu)

shap@shasta.Stanford.EDU (shap) (05/03/91)

In article <1991May01.172042.5214@buster.stafford.tx.us> rli@buster.stafford.tx.us writes:
>turk@Apple.COM (Ken "Turk" Turkowski) writes:
>
>>It is necessary to have 8, 16, and 32-bit data types, in order to be able
>>to read data from files.  I would suggest NOT specifying a size for the int
>>data type; this is supposed to be the most efficient integral data type
>>for a particular machine and compiler.
>
>You assume a lot about the data in the file.  Is it stored in a specific
>processor format (ala Intel vs Motorolla)?  My experience has been that
>binary data is not portable anyway.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/03/91)

In article <1991May1.023356.8048@trl.oz.au>, peter@llama.trl.OZ.AU (Peter Richardson - NSSS) writes:
> Hmmm. As I understand it. if you want to write truly portable code, you
> should never make assumptions about sizeof any integral types. We have
> a local header file on each machine type defining Byte, DoubleByte etc.
> For example, on sun4:
> 
> typedef unsigned char Byte;             // always a single byte
> typedef unsigned short DoubleByte;      // always two bytes
> typedef unsigned long QuadByte;         // always four bytes
> 
> If you want to use an int, use an int. If you want to use a 16 bit
> quantity, use a DoubleByte. To port to new machines, just change the
> header file. Purists may prefer "Octet" to "Byte".

Sorry.  You have just made a non-portable assumption, namely that there
*is* an integral type which holds an octet and that there *is* an
integral type which holds two octets, and so on.  If you want
"at least 8 bits", then use {,{,un}signed} char, and if you want
"at least 16 bits", then use {,unsigned} short.  The ANSI standard
guarantees those.  There is no need to introduce your own private
names for them.  If you want "exactly 8 bits" or "exactly 16 bits",
you have no reason to expect that such types will exist.

I am greatly disappointed that C++, having added so much to C, has not
added something like int(Low,High) to the language, which would stand
for the "most efficient" available integral type in which both Low and
High were representable.  The ANSI C committee were right not to add
such a construct to C, because their charter was to standardise, not
innovate.

An anecdote which may be of value to people designing a C compiler for
64-bit machines:  there was a UK company who built their own micro-coded
machine, and wanted to put UNIX on it.  Their C compiler initially had
char=8, short=16, int=32, long=64 bits, sizeof (int) == sizeof (char*).
They changed their compiler in a hurry, so that long=32 bits; it was
less effort to do that than to fix all the BSD sources.  It also turned
out to have market value in that many of their customers had been just
as sloppy with VAX code.

sizeof (char) is fixed at 1.  However, it should be quite easy to set up
a compiler so that the user can specify (whether in an environment variable
or in the command line) what sizes to use for short, int, long, and (if you
want to imitate GCC) long long.  Something like
	setenv CINTSIZES="16,32,32,64"	# short,int,long,long long.
The system header files would have to use the default types (call them
__int, __short, and so on) so that only one set of system libraries would
be needed, and this means that using CINTSIZES to set the sizes to something
other than the defaults would make the compiler non-conforming.
Make the defaults the best you can, but if you let people over-ride the
defaults then the task of porting sloppy code will be eased.  Other vendors
have found the hard way that customers have sloppy code.

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

shankar@hpcupt3.cup.hp.com (Shankar Unni) (05/03/91)

In comp.lang.c, torek@elf.ee.lbl.gov (Chris Torek) writes:

> In article <168@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
> >How bad is it for sizeof(int) != sizeof(long). 

> This has been the case on PDP-11s for over 20 years.

> It does cause problems---there is always software that makes invalid
> assumptions---but typically long-vs-int problems, while rampant, are
> also easily fixed.

Well, lint goes a long way towards pointing out int-long mismatches, etc.
But in my (humble?) opinion, much trouble would be headed off if C compilers
on 64-bit architectures would simply dispense with the 16-bit type and make
sizes of int == long == void * == 64 bits, and short == 32 bits.  Why is it
so terribly important to have a 16-bit data type?

In any case, memory is getting cheaper these days..
-----
Shankar Unni                                   E-Mail:
HP India Software Operation, Bangalore       Internet: shankar@india.hp.com
Phone : +91-812-261254 x417                      UUCP: ...!hplabs!hpda!shankar

rli@buster.stafford.tx.us (Buster Irby) (05/03/91)

cadsi@ccad.uiowa.edu (CADSI) writes:

>From article <1991May01.172042.5214@buster.stafford.tx.us>, by rli@buster.stafford.tx.us (Buster Irby):
>> turk@Apple.COM (Ken "Turk" Turkowski) writes:
>> 
>>>It is necessary to have 8, 16, and 32-bit data types, in order to be able
>>>to read data from files.  I would suggest NOT specifying a size for the int
>> 
>> You assume a lot about the data in the file.  Is it stored in a specific
>> processor format (ala Intel vs Motorolla)?  My experience has been that
>> binary data is not portable anyway.

>Binary isn't in general portable.  However, using proper typedefs in
>a class one can move binary read/write classes from box to box.  I think
>the solution the the whole issue of sizeof(whatever) is to simply assume
>nothing.  Always typedef.  It isn't that difficult, and code I've done this
>runs on things ranging from DOS machines to CRAY's COS (and UNICOS) without
>code (barring the typedef header files) changes.

What kind of typedef would you use to swap the high and low bytes
in a 16 bit value?  An Intel or BIG_ENDIAN machine stores the
bytes in reverse order, while a Motorolla or LITTLE_ENDIAN
machine stores the bytes in normal order (High to low).  There is
no way to fix this short of reading the file one byte at a time
and stuffing them into the right place.  The point I was trying
to make is that reading and writing a data file has absolutely
nothing to do with data types.  As we have already seen, there
are a lot of different machine types that support C, and as far
as I know, all of them are capable of reading binary files,
independent of data type differences.

The only sane way to deal with this issue is to never assume
anything about the SIZE or the ORDERING of data types, which is
basically what the C standard says.  It tells you that a long >=
int >= short >= char.  It says nothing about actual size or byte
ordering within a data type.  

Another trap I ran accross recently is the ordering of bit
fields.  On AT&T 3B2 machines the first bit defined is the high
order bit, but on Intel 386 machines the first bit defined is the
low order bit.  This means that anyone who attempts to write this
data to a file and transport it to another platform is in for a
surprise, they are not compatible.  Again, the C standard says
nothing about bit ordering, and in fact cautions you against
making such assumptions.

tmb@ai.mit.edu (Thomas M. Breuel) (05/04/91)

   You have just made a non-portable assumption, namely that there
   *is* an integral type which holds an octet and that there *is* an
   integral type which holds two octets, and so on.  If you want
   "at least 8 bits", then use {,{,un}signed} char, and if you want
   "at least 16 bits", then use {,unsigned} short.  The ANSI standard
   guarantees those.  There is no need to introduce your own private
   names for them.  If you want "exactly 8 bits" or "exactly 16 bits",
   you have no reason to expect that such types will exist.

   I am greatly disappointed that C++, having added so much to C, has not
   added something like int(Low,High) to the language, which would stand
   for the "most efficient" available integral type in which both Low and
   High were representable.  The ANSI C committee were right not to add
   such a construct to C, because their charter was to standardise, not
   innovate.

I think allowing the programmer to specify arbitrary precision
integers is equally bad, since it adds too much complexity to
the language and to compilers.

A good compromise would be to provide a set of precisions that can be
supported on current machines and extend the language standard
as newer, more powerful machines become available.

In essence, this is actually what the C standard does, if one
continues to use the current data types with roughly their current
meaning: "short" is close to, and at least 16 bits, and "long" is
close to, and at least 32 bit. The emphasis here is on "close to", and
this should probably be made explicit in the standard, since
programmers pragmatically do, and need to, rely on it to be able to
estimate what the space requirements of their programs will be.

When machines capable of handling larger integer data types become
available, new names for the new data types should be introduced.
Perhaps a more consistent naming scheme would be good: int8 (>= 8bit
integer), ..., int128 (>= 128 bit integer), etc.

					Thomas.

rockwell@socrates.umd.edu (Raul Rockwell) (05/04/91)

Richard A. O'Keefe:
   An anecdote which may be of value to people designing a C compiler for
   64-bit machines:  there was a UK company who built their own micro-coded
   machine, and wanted to put UNIX on it.  Their C compiler initially had
   char=8, short=16, int=32, long=64 bits, sizeof (int) == sizeof (char*).
   They changed their compiler in a hurry, so that long=32 bits; it was
   less effort to do that than to fix all the BSD sources. ...

eh??

any reason they couldn't have compiled with -Dlong=int ?  (Or, if you
wanna be fancy, you could 
#define long _long
typedef int  _long;

seems rather silly to break the compiler just because of old code...

Raul Rockwell

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/05/91)

What would be the best way to do this:

I want to pass around integer numbers that I know will require more than
32 bits but not more than 63 bits.  An example of such a number of the
number of microseconds in the century.  The uses include passing them to
functions as arguments and receiving them back as return values.

I want to specify it sufficiently that a reasonable implementation on a 64 bit
machine will in fact use the 64 bit integer instructions.  Whatever way it
is to be specified should work on all such 64 bit machines.

If I were to use an array of smaller integers, I'd have to code specific
macros or functions to apply operations to these values that would be more
preferrable to do as simple arithmetic operations.  But the big deal is that
a 64-bit machine would not get to use its 64-bit capabilities.  It does no
good to get a 64-bit machine if it is just going to be doing 32-bit data
operations all the time.

But of course I want to do it portable within the scope of 64 bit machines.

Shouldn't "long" always represent at least the longest natural operation
width on the given architecture, so that it is at least POSSIBLE to code
applications that need that architecture?

(I am speaking in terms of current and future directions in C, not in
compatibility of old code, which is a separate issue)
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/05/91)

rockwell@socrates.umd.edu (Raul Rockwell) writes:

>seems rather silly to break the compiler just because of old code...

And it seems rather silly to prevent NEW code from being GOOD code just
because of old code...

New compilers can be made to have different modes, one for old traditional
code, and one for new modern portable standard and possibly extended code.

The only time you'd need that is when you are porting old code to a new
machine and are not expecting to get the full benefit of the new machine
(such as using only 32 bit operations on a 64 bit machine... ick).
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

cadsi@ccad.uiowa.edu (CADSI) (05/06/91)

From article <1991May03.120455.158@buster.stafford.tx.us>, by rli@buster.stafford.tx.us (Buster Irby):
> cadsi@ccad.uiowa.edu (CADSI) writes:
> 
>>Binary isn't in general portable.  However, using proper typedefs in
>>a class one can move binary read/write classes from box to box.  I think
>>the solution the the whole issue of sizeof(whatever) is to simply assume
>>nothing.  Always typedef.  It isn't that difficult, and code I've done this
>>runs on things ranging from DOS machines to CRAY's COS (and UNICOS) without
>>code (barring the typedef header files) changes.
> 
> What kind of typedef would you use to swap the high and low bytes
> in a 16 bit value?  An Intel or BIG_ENDIAN machine stores the
> bytes in reverse order, while a Motorolla or LITTLE_ENDIAN
> machine stores the bytes in normal order (High to low).  There is
> no way to fix this short of reading the file one byte at a time
> and stuffing them into the right place.  The point I was trying
> to make is that reading and writing a data file has absolutely
> nothing to do with data types.  As we have already seen, there
> are a lot of different machine types that support C, and as far
> as I know, all of them are capable of reading binary files,
> independent of data type differences.

The big/little endian problem is handled via swab calls.
AND, how do we know when to do this????  We just store
the needed info in a header record.
This header is read in block fashion and typedef'ed to the structure we need.
from there, thats all we need to continue.
The typedefs have to do with internal structures, NOT simple int, char
and those type things, except for the BYTE type.
Last but not least, you'll inevitably ask how we portably read that header.
Well, we store a 'magic number' info and mess with things till the numbers are
read correctly.  Incidentally, that magic number also gives indications
of code revision level and therefore what will and won't be possible.
C'mon, this is not that difficult to comprehend.  You want portable files???
Make 'em yourself.  'C' gives you all the toys you need to do this.

[other stuff deleted - reference above]

|----------------------------------------------------------------------------|
|Tom Hite					|  The views expressed by me |
|Manager, Product development			|  are mine, not necessarily |
|CADSI (Computer Aided Design Software Inc.	|  the views of CADSI.       |
|----------------------------------------------------------------------------|

boyne@hplvec.LVLD.HP.COM (Art Boyne) (05/06/91)

In comp.lang.c, bhoughto@pima.intel.com (Blair P. Houghton) writes:

    There's no reason for an int to be less than the full
    register-width, and no reason for an address to be limited
    to the register width.

Wrong!  There is a *good* reason.  On processors whose data bus width
is less than the register width (eg., 68000/8/10), the performance penalty
for the extra data fetches may be significant.  And since these processors
have only 16x16 multiplies and 32x16 divides, a 16-bit "int" type may make
a lot more sense than a 32-bit "int".

Typical applications also should have an impact on the choice.  If the
compiler is intended to support general-purpose applications running on
a family of processors (eg., 680x0), then perhaps it should be tailored
to somewhere mid- to high-range.  On the other hand, one intended to
support imbedded applications only (like the instrument controllers I
work with), had better look at the low end *very* carefully.  68000's
as instrument controllers are common here.  68020's are almost unheard-of.
32-bit ints are a detriment for typical instrument control application,
in terms of RAM usage, ROM size, *and* performance.

For such CPU's and applications, it would *really* helpful for the
compiler to support a 16 or 32 bit "int" switch.  I wish the compiler
we use did.

Art Boyne, boyne@hplvla.hp.com

shap@shasta.Stanford.EDU (shap) (05/07/91)

In article <5535@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>
>sizeof (char) is fixed at 1.  However, it should be quite easy to set up
>a compiler so that the user can specify (whether in an environment variable
>or in the command line) what sizes to use for short, int, long, and (if you
>want to imitate GCC) long long.  Something like
>	setenv CINTSIZES="16,32,32,64"	# short,int,long,long long.
>The system header files would have to use the default types (call them
>__int, __short, and so on) so that only one set of system libraries would
>be needed, and this means that using CINTSIZES to set the sizes to something
>other than the defaults would make the compiler non-conforming.

In practice (having tried it once for other reasons), this doesn't work
as well as you might like.  The problem comes from the fact that the
vendor doesn't control the independent software vendors.  For their
part, the ISV's want portability, so it's real hard to convince them of
the merits of converting their header files.


It also becomes an ongoing support and update nightmare.

Jonathan

turk@Apple.COM (Ken Turkowski) (05/07/91)

rli@buster.stafford.tx.us (Buster Irby) writes:
>An Intel or BIG_ENDIAN machine stores the
>bytes in reverse order, while a Motorolla or LITTLE_ENDIAN
>machine stores the bytes in normal order (High to low).

You've got this perfectly reversed.  Motorola is a BIG_ENDIAN machine,
and Intel is a LITTLE_ENDIAN machine.  Additionally, there is no
such thing as "normal".

-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: TURK
UUCP: sun!apple!turk

turk@Apple.COM (Ken Turkowski) (05/07/91)

cadsi@ccad.uiowa.edu (CADSI) writes:
>The big/little endian problem is handled via swab calls.
>AND, how do we know when to do this????  We just store
>the needed info in a header record.
>This header is read in block fashion and typedef'ed to the structure we need.

What type of header do you suggest? This should be able to record
the ordering of shorts, longs, floats, and doubles, and might need
to specify floating-point format.
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: TURK
UUCP: sun!apple!turk

gwyn@smoke.brl.mil (Doug Gwyn) (05/07/91)

In article <1991May4.202438.14664@ux1.cso.uiuc.edu> phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) writes:
>I want to pass around integer numbers that I know will require more than
>32 bits but not more than 63 bits.
>I want to specify it sufficiently that a reasonable implementation on a 64 bit
>machine will in fact use the 64 bit integer instructions.  Whatever way it
>is to be specified should work on all such 64 bit machines.

The question is, should it also work on non-64 bit architectures?
If not, just use "long".  If so, you'll need some fairly obvious type
definitions, macros, etc.  To automatically configure your code to
accommodate both types of architecture, you can make the definitions
conditional on some arithmetic property in the preprocessor that will
produce different results in the two environments; for example you
could test for sign extension of the 32nd bit.

gwyn@smoke.brl.mil (Doug Gwyn) (05/07/91)

In article <TMB.91May3225038@volterra.ai.mit.edu> tmb@ai.mit.edu (Thomas M. Breuel) writes:
>In essence, this is actually what the C standard does, if one
>continues to use the current data types with roughly their current
>meaning: "short" is close to, and at least 16 bits, and "long" is
>close to, and at least 32 bit. The emphasis here is on "close to", and
>this should probably be made explicit in the standard, since
>programmers pragmatically do, and need to, rely on it to be able to
>estimate what the space requirements of their programs will be.

It was not the intention of the C standard to require the "close to"
attribute as you describe it.  Some architectures are such as to make
that an impractical implementation, and on such architectures a char
might even be 128 bits.  (However, most implementations will make an
exception for "char" and pack them fairly tightly into a word, even
though it does slow down operations on char type considerably.)

john@sco.COM (John R. MacMillan) (05/07/91)

|>|It is necessary to have 8, 16, and 32-bit data types, in order to be able
|>|to read data from files.
|>It's not necessary, but it does make it easier.
|
|Not even that.  Assuming that for some unknown reason you're faced with
|reading a binary file that originated on some other system, there is a
|fair chance that it used a "big endian" architecture while your system
|is "little endian" or vice-versa.

I certainly didn't mean to imply that this would help something be
universally portable; rather that if you're _lucky_ and the endianess
is the same, having the right size data types available might make
that single port easier.

|Binary data transportability is a much thornier issue than most people
|realize.

Yes, I've seen many ``portable'' binary formats that simply weren't.

msb@sq.sq.com (Mark Brader) (05/07/91)

> There are numerous CONFORMING ways in
> which additional integer types can be added to C.  "long long" is NOT
> one of these, and a standard-conforming implementation is OBLIGED to
> diagnose the use of "long long", which violates the Constraints of
> X3.159-1989 section 3.5.2.  Therefore "long long" is not a wise way
> to make such an extension.

I disagree.  I think "long long" is a preferable approach.

The Standard does not guarantee that there exists, in a C implement-
ation, any integral type wider than 32 bits.  A programmer wishing
to do arithmetic on integer values exceeding what can be stored in
32 bits has three options:

	(a) use floating point;
	(b) represent each such value using more than one object of some
existing integral type, e.g. using a "bignums package"; or
	(c) use an integral type known to provide the required number of
bits, and never port the program to machines where no such type exists.

Option (a) may not be feasible for a number of reasons, not least of
which is that significance may be lost using floating point -- the
Standard does not guarantee that any floating point type in a C imple-
mentation can hold as many significant digits as a 32-bit integer can.
Option (b) also has significant costs, so the programmer with access to a
suitable machine may choose to accept the portability loss and choose (c).

Now, what would we like to happen if a program that assumed 64-bit
integers existed was ported to a machine where they didn't?  We would
like the compilation to fail, that's what!  Suppose that the implementation
defines long to be 64 bits; then, to force such a failure, the programmer
would have to take some explicit action, like

	assert (LONG_MAX >= 0777777777777777777777);

On the other hand, suppose that the implementation defines a separate
"long long" type for 64-bit integers.  Then when the user compiles the
program on the 64-bit machine, they get:

	cc: warning: "long long" is an extension and not portable

and, assuming a reasonable quality of implementation, they can eliminate
this message with a cc option if desired.  And if they do try to port,
they get a fatal error in compilation.

This behavior seems exactly right to me.

Now, I am *not* saying that an implementation should necessarily make
"long long" its *only* 64-bit integral type.  It'd be wholly reasonable
to define *both* "long" and "long long" as 64 bits.  Just as a programmer
uses "long" whenever more than 16 bits are *required*, although "int" may
be the same as "long", "long long" could be used whenever more than 32
bits are required, although "long" might be the same as "long long".

-- 
Mark Brader            \  "He's suffering from Politicians' Logic."
SoftQuad Inc., Toronto  \ "Something must be done, this is something, therefore
utzoo!sq!msb, msb@sq.com \ we must do it."   -- Lynn & Jay: YES, PRIME MINISTER

This article is in the public domain.

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/08/91)

msb@sq.sq.com (Mark Brader) writes:

>Now, what would we like to happen if a program that assumed 64-bit
>integers existed was ported to a machine where they didn't?  We would

How about coding it so that if a symbol such as "LONGLONG64" is not
defined conditional compilation will fall back to code that invokes
a bignum package for 64 bit ints.  Then simply -DLONGLONG64 will get
the good code.

I just picked LONGLONG64 not knowing if there is a better thing to use.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

daves@ex.heurikon.com (Dave Scidmore) (05/08/91)

>rli@buster.stafford.tx.us (Buster Irby) writes:
>>An Intel or BIG_ENDIAN machine stores the
>>bytes in reverse order, while a Motorolla or LITTLE_ENDIAN
>>machine stores the bytes in normal order (High to low).

In article <13357@goofy.Apple.COM> turk@Apple.COM (Ken Turkowski) writes:
>You've got this perfectly reversed.  Motorola is a BIG_ENDIAN machine,
>and Intel is a LITTLE_ENDIAN machine.  Additionally, there is no
>such thing as "normal".

Exactly. Both conventions have good points and bad points. The "normal"
Motorolla convention starts to look a little less normal when dynamic
bus sizing is required, in which case all byte data comes over the most
significant data bus lines. In addition big endian machines have values
that expand in size from the most significant location (i.e. two bytes
values at location X have least significant bytes in a different location
than four byte values at the same location). On the other hand the little
endian convention looks odd when you do a dump of memory and try to find long
words within a series of bytes.

In the end you can make either convention look "normal" by how you draw
the picture of it. For example which of these is normal for storing the
value 0x12345678 ?

Motorola:	Location	0	1	2	3
		Bytes		0x12	0x34	0x56	0x78

Intel:		Location	3	2	1	0
		Bytes		0x12	0x34	0x56	0x78
--
Dave Scidmore, Heurikon Corp.
dave.scidmore@heurikon.com

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/09/91)

shankar@hpcupt3.cup.hp.com (Shankar Unni) writes:

>Well, lint goes a long way towards pointing out int-long mismatches, etc.
>But in my (humble?) opinion, much trouble would be headed off if C compilers
>on 64-bit architectures would simply dispense with the 16-bit type and make
>sizes of int == long == void * == 64 bits, and short == 32 bits.  Why is it
>so terribly important to have a 16-bit data type?

That DOUBLES the size of a program that is loading and working with
digitized audio samples that are moer than 8 bits wide.

>In any case, memory is getting cheaper these days..

And software is getting bigger just as fast to fill it up.
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/09/91)

In article <470@heurikon.heurikon.com>, daves@ex.heurikon.com (Dave Scidmore) writes:
> I'm supprised nobody has mentioned that the real solution to this kind
> of portability problem is for the original programmer to use the
> definitions in "types.h" that tell you how big chars, shorts, ints,
> and longs are.

Why be surprised?  I'm using an Encore Multimax running 4.3BSD, and
on this machine there _isn't_ any types.h file.  We have a copy of GCC,
so we _have_ access to the ANSI file, but that's <limits.h>, not "types.h".

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (05/09/91)

In article <TMB.91May3225038@volterra.ai.mit.edu>, tmb@ai.mit.edu (Thomas M. Breuel) writes:
>    I am greatly disappointed that C++, having added so much to C, has not
>    added something like int(Low,High) to the language, which would stand
>    for the "most efficient" available integral type in which both Low and
>    High were representable.  The ANSI C committee were right not to add
>    such a construct to C, because their charter was to standardise, not
>    innovate.
> 
> I think allowing the programmer to specify arbitrary precision
> integers is equally bad, since it adds too much complexity to
> the language and to compilers.

But nowhere did I suggest "allowing the programmer to specify arbitrary
precision integers".  Read what I wrote: the compiler would select "the
most efficient AVAILABLE integral type in which both Low and High were
representable".  I'm not talking about increasing the stock of integral
types built into the compiler, simply talking about a way of selecting
one of those types.  For any given copy of <limits.h> I can define an
M4 macro
	selected_integral_type(Low,High)
which expands to char,short,int,long, &c.  Indeed, I believe I posted
such a macro to this group last year.  My point is that this could have
been built into the compiler TRIVIALLY.  I really do mean "trivially";
we are talking about adding one 10-line function to a C compiler, plus
a couple of productions in a Yacc grammar.

What I was asking for is _less_ than what Pascal provides!

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

clive@x.co.uk (Clive Feather) (05/09/91)

In article <1991May6.232116.11401@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
> I disagree.  I think "long long" is a preferable approach.
[...]
> A programmer wishing
> to do arithmetic on integer values exceeding what can be stored in
> 32 bits has three options:
[...]
>   (c) use an integral type known to provide the required number of bits,
>       and never port the program to machines where no such type exists.
[...]
> Now, what would we like to happen if a program that assumed 64-bit
> integers existed was ported to a machine where they didn't?  We would
> like the compilation to fail, that's what!  Suppose that the implementation
> defines long to be 64 bits; then, to force such a failure, the programmer
> would have to take some explicit action, like
>
>	assert (LONG_MAX >= 0777777777777777777777);
>
> On the other hand, suppose that the implementation defines a separate
> "long long" type for 64-bit integers.  Then when the user compiles the
> program on the 64-bit machine, they get:
>
>	cc: warning: "long long" is an extension and not portable
>
> and, assuming a reasonable quality of implementation, they can eliminate
> this message with a cc option if desired.  And if they do try to port,
> they get a fatal error in compilation.
>
> This behavior seems exactly right to me.

If you want the compilation to fail, then what's wrong with the
following ?

    #if LONG_MAX < 0xFFFFffffFFFFffff
    ??=error Long type not big enough for use.
    #endif

This causes the compilation to fail only when long is not big enough,
does not require any new types in the implementation, and generates *no*
messages on an 64-bit-long system. 

Notes: the use of a hex, rather than octal, constant awith mixed case
makes it easier to count the number of digits, and the explicit trigraph
is used to choke (non-ANSI) implementations which don't have #error, and
which might object to it even when the condition of the #if is false.
-- 
Clive D.W. Feather     | IXI Limited         | If you lie to the compiler,
clive@x.co.uk          | 62-74 Burleigh St.  | it will get its revenge.
Phone: +44 223 462 131 | Cambridge   CB1 1OJ |   - Henry Spencer
(USA: 1 800 XDESK 57)  | United Kingdom      |

conor@lion.inmos.co.uk (Conor O'Neill) (05/09/91)

In article <179@shasta.Stanford.EDU> shap@shasta.Stanford.EDU (shap) writes:
>While I happen to agree with this sentiment, there is an argument that X
>hundred million lines of C code can't be wrong.  The problem with
>theology is that it's not commercially viable.


Or did you mean "C hundred million lines of X code"...

(Apparently X even has such nasties buried inside it as expecting
that successive calls to malloc have higher addresses,
forcing the heap to grow upwards.) (So I'm informed)
---
Conor O'Neill, Software Group, INMOS Ltd., UK.
UK: conor@inmos.co.uk		US: conor@inmos.com
"It's state-of-the-art" "But it doesn't work!" "That is the state-of-the-art".

det@nightowl.MN.ORG (Derek E. Terveer) (05/10/91)

msb@sq.sq.com (Mark Brader) writes:


>> There are numerous CONFORMING ways in
>> which additional integer types can be added to C.  "long long" is NOT
>> one of these, and a standard-conforming implementation is OBLIGED to
>> diagnose the use of "long long", which violates the Constraints of
>> X3.159-1989 section 3.5.2.  Therefore "long long" is not a wise way
>> to make such an extension.

>I disagree.  I think "long long" is a preferable approach.

>The Standard does not guarantee that there exists, in a C implement-
>ation, any integral type wider than 32 bits.  [...]

But the standard also does not guarantee (as far as i know) that there doesn't exist
>32 bits.

What is wrong with simply implementing the following in a compiler?

	char	=	 8 bits
	short	=	16 bits
	int	=	32 bits
	long	=	64 bits
-- 
det@nightowl.mn.org

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/11/91)

det@nightowl.MN.ORG (Derek E. Terveer) writes:

>What is wrong with simply implementing the following in a compiler?

>	char	=	 8 bits
>	short	=	16 bits
>	int	=	32 bits
>	long	=	64 bits

There is apparently (as some people complain about) code out there that
depends upon the MAX size of the type.  In other words, if "long" is
longer than 32 bits, it breaks.

But porting such code to a 64-bit machine *AND* writing good standardized
code for the same machine are in mutual conflict because of this.

The only way out I can see is for the compiler to default to what is the
most reasonable for NEW AND GOOD code to be developed, and have some sort
of flag or flags to allow it to be customized to better handle the cases
of porting old code.  I'd also suspect that if you can find code where the
long depends on being exactly 32 bits, you could well find code where the
int depends on being exactly 16 bits.  So there probably is not one single
ideal solution to the problem.  So perhaps a system of flags like:

  -SHORTnn
  -INTnn
  -LONGnn

Which actually change the sizes of the primitive types, giving a warning
if the "short <= int <= long" constraint is violated (but do the compile
as specified anyway).

It would be an extension, not standard C.  But when porting old code, we
aren't addressing standards, are we?
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

gwyn@smoke.brl.mil (Doug Gwyn) (05/11/91)

In article <45690005@hpcupt3.cup.hp.com> shankar@hpcupt3.cup.hp.com (Shankar Unni) writes:
>But in my (humble?) opinion, much trouble would be headed off if C compilers
>on 64-bit architectures would simply dispense with the 16-bit type and make
>sizes of int == long == void * == 64 bits, and short == 32 bits.  Why is it
>so terribly important to have a 16-bit data type?

It isn't important, except perhaps to people who are trying to port
poorly-implemented code that managed to depend on such a system-dependent
feature.  Such code undoubtedly has far worse problems that that anyway.

By the way, there is NO NEED to say what data size choices a 64-bit
implementation "should" make.  It "should" not matter to any sensible
application.

david@ap542.uucp (05/15/91)

-cadsi@ccad.uiowa.edu (CADSI) writes:
->by rli@buster.stafford.tx.us (Buster Irby):
->> cadsi@ccad.uiowa.edu (CADSI) writes:
->> 
->>>Binary isn't in general portable.  However, using proper typedefs in
->>>a class one can move binary read/write classes from box to box.  
->> 
->> What kind of typedef would you use to swap the high and low bytes
->> in a 16 bit value?  An Intel or BIG_ENDIAN machine stores the
->> bytes in reverse order, while a Motorolla or LITTLE_ENDIAN
->> machine stores the bytes in normal order (High to low).  There is
->> no way to fix this short of reading the file one byte at a time
->> and stuffing them into the right place.  
->
->The big/little endian problem is handled via swab calls.
->AND, how do we know when to do this????  We just store
->the needed info in a header record.

NO, NO, NO!  The way to do this is to use XDR.

**========================================================================**
David E. Smyth
david%ap542@ztivax.siemens.com	<- I can get your mail, but our mailer
				   is broken.  But I can post.  You figure...
**========================================================================**

ray@philmtl.philips.ca (Ray Dunn) (05/16/91)

In referenced article, bhoughto@pima.intel.com (Blair P. Houghton) writes:
>>2. If a trade-off has to be made between compliance and ease of
>>porting, what's the better way to go?
>
>If you're compliant, you're portable.

This is like saying if the syntax is correct then the semantics must be too
although, indeed, there is no need to trade compliance for portability.

There are dangers clearly visible though.  All you can say is that your
compliant program has an excellent chance of compiling on another system
running a compliant compiler, not that it will necessarily work correctly
with the new parameters plugged in.

You only know a program is portable *after* you've tested it on another
system.  How many do that during the initial development cycle?

Porting is not an issue that goes away by writing compliant code - it may
in fact *hide* some of the problems.

If I wanted to be controversial, I might say that 'C's supposed
"portability" is a loaded cannon.  Bugs caused by transferring a piece of
software to another system will continue to exist, even in compliant
software.  Prior to "portable" 'C', porting problems were *expected*,
visible, and handled accordingly.

Will developers still assume these bugs to be likely, and handle
verification accordingly, or will they be lulled by "it compiled first
time" into thinking that the portability issue has been taken into account
up front, and treat it with less attention than it deserves?
-- 
Ray Dunn.                    | UUCP: ray@philmtl.philips.ca
Philips Electronics Ltd.     |       ..!{uunet|philapd|philabs}!philmtl!ray
600 Dr Frederik Philips Blvd | TEL : (514) 744-8987  (Phonemail)
St Laurent. Quebec.  H4M 2S9 | FAX : (514) 744-9550  TLX: 05-824090

jeh@cmkrnl.uucp (05/16/91)

In article <521@heurikon.heurikon.com>, daves@ex.heurikon.com (Dave Scidmore)
 writes:
> [...]
> On the other hand the little
> endian convention looks odd when you do a dump of memory and try to find long
> words within a series of bytes.

Yup.  But this can be solved.  The VAX is a little-endian machine, and VMS
utilities address [ahem] this problem by always showing hex contents with
increasing addresses going from right to left across the page.  Since
significance of the bytes (actually the nibbles) increases with increasing
addresses, this looks perfectly correct... the most significant nibble goes on
the left, just the way you'd "naturally" write it.  For example, the value 1
stored in 32 bits gets displayed as 00000001.  

If you get a hex-plus-Ascii dump, such as is produced by DUMP (for files) or
ANALYZE/SYSTEM (lets you look at live memory), the hex goes from right to left,
and the ascii from left to right, like this: 

SDA> ex 200;60
0130011A 0120011B 0130011E 0110011F  ......0... ...0.     00000200
01200107 02300510 04310216 04210218  ..!...1...0... .     00000210
01100103 01100104 01200105 01200106  .. ... .........     00000220
44412107 01100100 01100101 01100102  .............!AD     00000230
4B202020 20444121 44412106 42582321  !#XB.!AD!AD    K     00000240
00524553 55525055 53434558 454E5245  ERNEXECSUPRUSER.     00000250

In the last row, the string "EXEC" is at address 253, and the last byte on the
line, 25F, contains hex 00.  In the first row, the word (16 bits) at location
204 contains hex value 11E; if you address the same location as a longword, you
get the value 0130011E.  

This looks completely bizarre at first, but once you get used to it (a few
minutes or so for most folks) it makes perfect sense.  

The VAX is consistent in bit numbering too:  The least significant bit of a 
byte is called bit 0, and when you draw bit maps of bytes or larger items,
you always put the lsb on the right.  

	--- Jamie Hanrahan, Kernel Mode Consulting, San Diego CA
Chair, VMS Internals Working Group, U.S. DECUS VAX Systems SIG 
Internet:  jeh@dcs.simpact.com, hanrahan@eisner.decus.org, or jeh@crash.cts.com
Uucp:  ...{crash,scubed,decwrl}!simpact!cmkrnl!jeh

msb@sq.sq.com (Mark Brader) (05/18/91)

> > I think "long long" is a preferable approach. ... A programmer wishing
> > to do arithmetic on integer values exceeding what can be stored in
> > 32 bits has three options:
> [...]
> >   (c) use an integral type known to provide the required number of bits,
> >       and never port the program to machines where no such type exists.
> [...]
> > Now, what would we like to happen if a program that assumed 64-bit
> > integers existed was ported to a machine where they didn't?  We would
> > like the compilation to fail, that's what!  Suppose that the implementation
> > defines long to be 64 bits; then, to force such a failure, the programmer
> > would have to take some explicit action, like
> >	assert (LONG_MAX >= 0777777777777777777777);

> If you want the compilation to fail, then what's wrong with the
> following ?
> 
>     #if LONG_MAX < 0xFFFFffffFFFFffff		/* wrong, actually */
>     ??=error Long type not big enough for use.
>     #endif

I would take that to be "something like" my assert() example, and don't
have a strong preference between one and the other.  In ANSI C the use
of an integer constant larger than ULONG_MAX is a constraint violation
(3.1.3) anyway, so it really suffices to say

	01777777777777777777777;

and this has a certain charm to it.  But the compiler might issue only
a warning, rather than aborting the compilation.

> This causes the compilation to fail only when long is not big enough,
> does not require any new types in the implementation, and generates *no*
> messages on an 64-bit-long system. 

But whichever of these the programmer uses, *it has to be coded explicitly*.
My feeling is that enough of the 64-bit people [i.e. those worth $8 :-)]
will carelessly omit to do so, once 64-bit machines [i.e. those worth $8!
:-) :-)] become more common, as to create portability problems.  It will,
I fear, be exactly the situation that we've already seen where there's
much too much code around that assumes 32-bit ints.

> Notes: the use of a hex, rather than octal, constant with mixed case
> makes it easier to count the number of digits ...

The octal was for fun, since it was a "bad example" anyway.  However, if
one *is* going to amend it, it would be as well if the amended version
retained the correct value of the constant.  (It was LONG_MAX, not
ULONG_MAX, in that example.)


} But the standard also does not guarantee (as far as i know) that there
} doesn't exist [a type with] >32 bits.
} 
} What is wrong with simply implementing the following in a compiler?
} 	char	=	 8 bits
} 	short	=	16 bits
} 	int	=	32 bits
} 	long	=	64 bits

Nothing -- unless, as I explained above, it leads to a community of users
who *expect* long to have 64 bits.  My own preference would in fact be
to have a long long type, but for *both* long and long long to be 64 bits.

(The long long type would also imply such things as LL suffixes on
constants, %lld printf formats, appropriate type conversion rules,
and so on.  I haven't examined the standard exhaustively to see whether
there's anything where the appropriate extension is non-obvious, but
certainly for most things it is obvious.)

I would like to see "long long" established enough in common practice
that, in the *next* C standard, the section that now itemizes among other
things the following minimum values:

	SHRT_MAX		+32767
	INT_MAX			+32767
	LONG_MAX		+2147483647

will, *if* 64-bit machines are sufficiently common by then, leave those
values unchanged and add:

	LLONG_MAX		+9223372036854775807

-- 
Mark Brader		    "'A matter of opinion'[?]  I have to say you are
SoftQuad Inc., Toronto	      right.  There['s] your opinion, which is wrong,
utzoo!sq!msb, msb@sq.com      and mine, which is right."  -- Gene Ward Smith

This article is in the public domain.

bhoughto@pima.intel.com (Blair P. Houghton) (05/18/91)

In article <1991May15.190016.21817@philmtl.philips.ca> ray@philmtl.philips.ca (Ray Dunn) writes:
>In referenced article, bhoughto@pima.intel.com (Blair P. Houghton) writes:
>>If you're compliant, you're portable.
>
>You only know a program is portable *after* you've tested it on another
>system.  How many do that during the initial development cycle?

Well, I do, several times, to the point of working for a while on
one platform, moving to another, testing, working there for a while,
moving to a third, and so on.  It helps to aim for three targets.

>If I wanted to be controversial, I might say that 'C's supposed
>"portability" is a loaded cannon.  Bugs caused by transferring a piece of
>software to another system will continue to exist, even in compliant
>software.  Prior to "portable" 'C', porting problems were *expected*,
>visible, and handled accordingly.

Now they're bugs in the compiler, not just "issues of
implementation."

If you're using any C construct that produces different
behavior on disparate, conforming implementations, then
either one of those implementations is not conforming or
you are not using ANSI C, but rather have used some sort of
extension or relied on some sort of unspecified behavior,
and therefore your program is not compliant.

>Will developers still assume these bugs to be likely, and handle
>verification accordingly, or will they be lulled by "it compiled first
>time" into thinking that the portability issue has been taken into account
>up front, and treat it with less attention than it deserves?

Your question is all but naive.  I still get valid
enhancement requests on code that's several years old,
which means that I failed to design it to suit the needs of
my customer, which means it's buggy.  Routines that have
been compiled and/or run thousands of times in real-world
situations come up wanting.  Nobody sane assumes that
anything is right the first time (though one may determine
that the probability of failure is low enough to make an
immediate release feasible).

				--Blair
				  "I'm going to put all of this
				   on video and hawk it on cable
				   teevee in the middle of the
				   night while wearing pilled
				   polyester and smiling a lot."

bret@orac.UUCP (Bret Indrelee) (05/20/91)

In article <1991May9.192156.19291@nightowl.MN.ORG> det@nightowl.MN.ORG (Derek E. Terveer) writes:
>
>What is wrong with simply implementing the following in a compiler?
>
>	char	=	 8 bits
>	short	=	16 bits
>	int	=	32 bits
>	long	=	64 bits

I agree, this is the best of many choices.  The problems with it are:
  1) You will use more data space because longs are twice as large.  On
     a 64bit arch, this means problems in swapping.  There is enough
     VA space, you just wouldn't be using as efficiently as if long
     was 32 bits.
  2) You break programs that assume int is going to match the size of
     anything.  Translation: you break programs that already can not
     be ported between available 32 bit machines that make a different
     choice (sizeof int == sizeof short) || (sizeof int == sizeof long)

     Fix the programs.

  3) You break programs that don't use void pointers when they need
     a generic pointer.  See #2 above.

  4) You may find new bugs in programs, where an overflow that you never
     knew existed on a 32bit machine now makes your integer math come
     out different.

Most of these come down to problems with programs that already don't
work on existing 32 machines.

Start using typedef and MAX_INT people.  Your replacements will thank
you rather than curse you.

-Bret

-- 
------------------------------------------------------------------------------
Bret Indrelee		|	<This space left intentionally blink>
bret@orac.edgar.mn.org	|					;^)

bret@orac.UUCP (Bret Indrelee) (05/20/91)

In article <16103@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>
>By the way, there is NO NEED to say what data size choices a 64-bit
>implementation "should" make.  It "should" not matter to any sensible
>application.



Except maybe the guy writing a device driver, where the person needs
to exactly match the size of a data type to the size of a hardware
register.  Yet another reason to spread the sizes around.

-Bret
-- 
------------------------------------------------------------------------------
Bret Indrelee		|	<This space left intentionally blink>
bret@orac.edgar.mn.org	|					;^)

clive@x.co.uk (Clive Feather) (05/21/91)

In article <1991May18.011520.8330@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
[>>> is msb, >> is myself]
>>> Suppose that the implementation
>>> defines long to be 64 bits; then, to force such a failure, the programmer
>>> would have to take some explicit action, like
>>>	assert (LONG_MAX >= 0777777777777777777777);
>> If you want the compilation to fail, then what's wrong with the
>> following ?
>>     #if LONG_MAX < 0xFFFFffffFFFFffff		/* wrong, actually */
>>     ??=error Long type not big enough for use.
>>     #endif

> I would take that to be "something like" my assert() example, and don't
> have a strong preference between one and the other.

True. The only advantage I claim for it is that it is a compile time
test, rather than a run-time test.

> But whichever of these the programmer uses, *it has to be coded explicitly*.
> My feeling is that enough of the 64-bit people [i.e. those worth $8 :-)]
> will carelessly omit to do so, once 64-bit machines [i.e. those worth $8!
> :-) :-)] become more common, as to create portability problems.  It will,
> I fear, be exactly the situation that we've already seen where there's
> much too much code around that assumes 32-bit ints.

But no-one suggests forcing ints to be 32 bits just to solve this
problem. In general, there is no sympathy for people who fail to code
for portability. Why should we make an exception for this one case. You
might equally well say that people used to 36-bit machines shouldn't
have to write code like:

    #if UINT_MAX >= 0x3FFFF
    typedef signed   int  native_int_like_type;
    #else
    typedef signed   long native_int_like_type;
    #else
    #if INT_MAX >= 0x1FFFF && INT_MIN < -0x20000
    typedef unsigned int  native_uint_like_type;
    #else
    typedef unsigned long native_uint_like_type;
    #else

if they want to continue to think in 18-bit ints.

> However, if one *is* going to amend it, it would be as well if the
> amended version retained the correct value of the constant.

Mea culpa.
-- 
Clive D.W. Feather     | IXI Limited         | If you lie to the compiler,
clive@x.co.uk          | 62-74 Burleigh St.  | it will get its revenge.
Phone: +44 223 462 131 | Cambridge   CB1 1OJ |   - Henry Spencer
(USA: 1 800 XDESK 57)  | United Kingdom      |

ray@philmtl.philips.ca (Ray Dunn) (05/21/91)

In referenced article, bhoughto@pima.intel.com (Blair P. Houghton) writes:
>In referenced article, ray@philmtl.philips.ca (Ray Dunn) writes:
>>Prior to "portable" 'C', porting problems were *expected*,
>>visible, and handled accordingly.
>
>Now they're bugs in the compiler, not just "issues of
>implementation."

No - now they're "issues of *system dependancies*".

>>Will developers still assume these bugs to be likely, and handle
>>verification accordingly, or will they be lulled by "it compiled first
>>time" into thinking that the portability issue has been taken into account
>>up front, and treat it with less attention than it deserves?
>
>Your question is all but naive.

Only if you ignore the fact, which you seem to do, that many of the issues
of portability in the real world are created by differences in system
hardware, operating systems and file management facilities.  This is true
for nearly all software for example which has a tightly coupled user
interface, or which is forced to process system specific non-ascii-stream
data files, or to interface with multi-tasking facilities.  Even
differencies in Floating Point handling can create major pains-in-the-neck.

There's more to portability than 'C' conformity.
-- 
Ray Dunn.                    | UUCP: ray@philmtl.philips.ca
Philips Electronics Ltd.     |       ..!{uunet|philapd|philabs}!philmtl!ray
600 Dr Frederik Philips Blvd | TEL : (514) 744-8987  (Phonemail)
St Laurent. Quebec.  H4M 2S9 | FAX : (514) 744-9550  TLX: 05-824090
-- 
Ray Dunn.                    | UUCP: ray@philmtl.philips.ca
Philips Electronics Ltd.     |       ..!{uunet|philapd|philabs}!philmtl!ray
600 Dr Frederik Philips Blvd | TEL : (514) 744-8987  (Phonemail)
St Laurent. Quebec.  H4M 2S9 | FAX : (514) 744-9550  TLX: 05-824090

timr@gssc.UUCP (Tim Roberts) (05/23/91)

In article <313@orac.UUCP> bret@orac.UUCP (Bret Indrelee) writes:
>In article <1991May9.192156.19291@nightowl.MN.ORG> det@nightowl.MN.ORG (Derek E. Terveer) writes:
>>
>>What is wrong with simply implementing the following in a compiler?
>>
>>	char	=	 8 bits
>>	short	=	16 bits
>>	int	=	32 bits
>>	long	=	64 bits
>
>I agree, this is the best of many choices.  

Wrong!  This is NOT necessarily the best of many choices.  THINK about this
for a minute!  We've had a lot of entirely useless philisophical discussion
on this issue.

What if your 64 bit architecture doesn't have any instructions to deal with
16 bit units?  You certainly aren't going to include something as a fundamental
type when your architecture can't easily deal with it, are you?  What if, going
further, you can't manipulate 32 bit objects either?  On such a machine, you
would probably create short=int=long=64 bits.

The point is this:  C data types are intended to map into the fundamental
operating units of the underlying hardware.  Discussing the correctness of
C data type sizing on 64-bit machines in the general case is a pointless waste 
of network bandwidth.

-- 
timr@gssc.gss.com	Tim N Roberts, CCP	Graphic Software Systems
						Beaverton, OR

This is a very long palindrome. .emordnilap gnol yrev a si sihT

henry@zoo.toronto.edu (Henry Spencer) (05/23/91)

In article <6659@gssc.UUCP> timr@gssc.UUCP (Tim Roberts) writes:
>What if your 64 bit architecture doesn't have any instructions to deal with
>16 bit units?  ...

Then it's going to be in big trouble trying to implement TCP/IP...!

>The point is this:  C data types are intended to map into the fundamental
>operating units of the underlying hardware.  Discussing the correctness of
>C data type sizing on 64-bit machines in the general case is a pointless waste 
>of network bandwidth.

Not really.  There are some really sticky questions even for well-designed
64-bit machines, where there is no strong a priori preference for one
scheme or the other.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

dhoward@ready.eng.ready.com (David Howard) (05/23/91)

In article <6659@gssc.UUCP> timr@gssc.UUCP (Tim Roberts) writes:
>...
>What if your 64 bit architecture doesn't have any instructions to deal with
>16 bit units?  You certainly aren't going to include something as a fundamental
>type when your architecture can't easily deal with it, are you?  What if, going
>further, you can't manipulate 32 bit objects either?  On such a machine, you
>would probably create short=int=long=64 bits.

C compilers for the 80x86 abortchitecture have long=32 and pointer=32,
neither of which is easily supported or natural on that chip. 

The question as to whether C types should map to the architecture or
to what is easiest on the programmer is an interesting one.

bhoughto@pima.intel.com (Blair P. Houghton) (05/23/91)

In article <314@orac.UUCP> bret@orac.UUCP (Bret Indrelee) writes:
>In article <16103@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>>It "should" not matter to any sensible application.
>
>Except maybe the guy writing a device driver, where the person needs
>to exactly match the size of a data type to the size of a hardware
>register.  Yet another reason to spread the sizes around.

Picayune semantics: "sensible applications" and "device
drivers" are two entirely different laws of physics.

More to the point:  the driver developer is going to be
doing many things more heinous than bit-fields; e.g., casting
integer types to pointer types in order to reach memory
mappings (even tricks with indexing "all of memory" require
placing the base of the "all-of-memory array" at some
defined point).

ANSI C is specifically not designed for that sort of work.

Such things are often better done in assembler, anyway
(regardless of ease-of-maintenance).

				--Blair
				  "The janitorial service industry
				   is 7000 years old, and still
				   nobody thinks there's any dirty
				   work left to be done..."

steve@taumet.com (Stephen Clamage) (05/23/91)

dhoward@ready.eng.ready.com (David Howard) writes:

>C compilers for the 80x86 abortchitecture have long=32 and pointer=32,
>neither of which is easily supported or natural on that chip. 

Type long is required to be at least 32 bits.  This is reasonable on
386/486.  If it is not convenient on 8086/286, that is irrelevant,
since the type cannot be smaller.  No one would implement long as, say,
36 bits on these machines, since that would in fact be unnatural.

Pointers on 8086/286 (or 386 in "real" mode) are either 16 or 32 bits,
depending on whether they are "near" or "far", and consequently are easily
supported and natural.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

phil@ux1.cso.uiuc.edu (Phil Howard KA9WGN) (05/24/91)

timr@gssc.UUCP (Tim Roberts) writes:

>What if your 64 bit architecture doesn't have any instructions to deal with
>16 bit units?  You certainly aren't going to include something as a fundamental
>type when your architecture can't easily deal with it, are you?  What if, going
>further, you can't manipulate 32 bit objects either?  On such a machine, you
>would probably create short=int=long=64 bits.

I believe the discussion centered around machines that indeed could
manipulate quantites in all the sizes.  But you do have a valid point.
The concern I have in the matter is whether or not the capability to
manipulate quantites in 64-bit sizes is left out of the standardized
part of C.

>The point is this:  C data types are intended to map into the fundamental
>operating units of the underlying hardware.  Discussing the correctness of
>C data type sizing on 64-bit machines in the general case is a pointless waste 
>of network bandwidth.

I believe C requires:  short <= int <= long

But it is also suggested that the fundamental unit be defined as "int",
not as "long".  Which way would you go?
-- 
 /***************************************************************************\
/ Phil Howard -- KA9WGN -- phil@ux1.cso.uiuc.edu   |  Guns don't aim guns at  \
\ Lietuva laisva -- Brivu Latviju -- Eesti vabaks  |  people; CRIMINALS do!!  /
 \***************************************************************************/

henry@zoo.toronto.edu (Henry Spencer) (05/24/91)

In article <4383@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes:
>More to the point:  the driver developer is going to be
>doing many things more heinous than bit-fields...
>ANSI C is specifically not designed for that sort of work.

Au contraire; C was designed for that sort of work from the beginning,
since that was its first major application, and ANSI C did not break this.
One needs to be a bit careful nowadays about using things like "volatile",
since modern C compilers are much more aggressive than the DMR original
that was used to rewrite the Unix kernel in C, but that's a detail.

>Such things are often better done in assembler, anyway
>(regardless of ease-of-maintenance).

They are *almost* always better done in C.  Given a good compiler, it's
rare for something to be doable in assembler but not in C.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry