[net.unix-wizards] Information on Unix/Vax peculiarities

CHUQUI@MIT-MC@sri-unix (11/28/82)

From: Charles F. Von Rospach <CHUQUI at MIT-MC>
Date: 27 November 1982 00:18-EST

Here is a (probably) very obvious question. I hope there is an equally
obvious answer:

Is there some place where the obvious differences between the published
information for BSD4.1 and how things really are on a Vax machine can
be found? I am especially interested in machine differences between the
Vax and the PDP-11, which is what most of the material (including the
white book) are written for.

What I am most interested in are things like:

	The width of the various variable definitions (long, int, short, etc)
	on the Vax.

	What variable types can be used for register variables, and the 
	number of register variables available for use at any one time.

	Any internal storage changes between PDP11 C and Vax C.

	Any other machine compatibility problems that there are.

What brings this up is an interesting problem: I have a program which seems
to have been written for the PDP11. In bringing it up on the Vax, I have
been getting funny results. It seems that whoever wrote the program is doing
some funny things in a large integer array, reading and writing into it with
a character pointer. Since I assume the program did work (its problems are
much too obvious to have let through), I am assuming that there is some 
problem with how the thing handles internal storage. It seems as though
it is assuming that an integer is 16bits, stored in two bytes in a low/high
byte setup. IF I remember my PDP11, that is how it stores things. I am 
rather new to that Vax, but I believe it stores things high/low. I am
right now wondering if there are any other time bombs out there that I 
(and the rest of the net) need to be aware of.

As an aside, the program flows through lint without a single mutter. I 
have not taken the program apart as yet, but a cursory look at the code
shows me that it seems to be doing some real strange things with pointers.
I would have hoped that lint would have caught this, but evidently not.

IF you know of a general source for the differences between machines (either
Vax/PDP11 or any of the Unix machines in general), please let me know. If 
you have a particular problem, pass it along and I can summarize for the 
net.

chuck (chuqui at mit-mc)

zrm (11/28/82)

Well, it would take too long to list all the possible machine dependencies
possible with C compiler instantiations, but perhaps some general
philosophy of C might help.

o In general, you want to be able to cast *anything* to int. This means
  an int ought to be the same size as a pointer. Berkley, and Whitesmith's
  do this correctly in their compilers, Alcyon does it wrong.

o Muddling about inside larger objects should be done with unions. However,
  in the program CHUQUI mentioned, the author seems to have been ingenious
  enough to scuttle all portability -- he probably assumes ints hold
  exactly two bytes. Note that using unions also insures tha byte ordering
  within a larger word won't matter from machine to machine (assuming
  the compiler isn't broken). There is also a rather rude hack you can
  use where you declare and array

	struct
	{
		char hibyte;
		char lobyte;
	}

  and use the members of that structure to reference parts of a two-byte
  object with the syntax

	WORD foo;		/* WORD is a defined type, another
				   very useful tool */
	high = foo.hibyte;	/* BLETCH! */

  but this is really horrid style and would rather use RPGII than actually
  maintain code like that.

o A fix that might work for CHUQUI is using defined types. But you have
  to be very careful on several counts.

  > If you substitute, say, WORD for int, on a wholesale basis, you have
    to make sure no pointers are being put in ints.

  > All parameters passed to functions are of size sizeof(int). Treat
    parameters essentially as you would registers.

All this is making me queasy. There are all sorts of other nasty little
things you can trip over on your way from one machine (or compiler)
to the next and the ony way to verify portability is to port something
and fix it when it breaks.

Indigestion!
Zig

ark (11/29/82)

It is not true that pointers must be the same size as ints.
It IS true that a pointer should be capable of being cast
without loss of information to a sufficiently long int.
The key words are "sufficiently long".  There are machines
out there with 16 bit integers and 32 bit pointers.

zrm (11/29/82)

You really do want ints to be as big as pointers, and, in fact, only as
big as pointers. The main reason I can think of off the top of my head
is that you want to be able to subscript a pointers over the whole
address space that that pointer is in. On 32 bit machines this means 2^32
bytes. It would be very difficult to rejig pointer arithmetic and its
interface to normal arithmetic so that different size objects could be
used in each. The one compiler that I have worked with that has 16 bit
ints and 32 bit pointers has grotesquely broken pointer arithmetic.

Also, C has enough different datatypes so as to accomodate both the
"natural" sizes for ints and pointers across different machines and let
the programmer insure that objects that have to be of a certain size are
that size.

I use defined types to maintain portability from pdp11s to 68000s.
Specifically, if I need a 32 bit quantity I define a type LONGWORD. On
the 11 and the 68000 it looks like this:

typedef LONGWORD long;

But if I need a 16 bit quantity, and need to use it in portable code,
the typedefs go like this:

#ifdef PDP11

typedef WORD int;

#else
#ifdef 68000

typedef WORD short;

This also localizes the program's machine dependence. And speaking of C
compilers, does anyone know of a good compiler for the PDP10?

Cheers,
Zig

gwyn@Brl@sri-unix (12/01/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     29 Nov 82 13:37:50-EST (Mon)
I have used machines on which a (char *) cannot possibly fit into an (int).

I am curious whether anyone has ever found a case in which (char *) is not
the "fattest" pointer type.  I generally assume it is, so I would like to
hear if this isn't always true.  Thanks...

fred.umcp-cs@UDel-Relay@sri-unix (12/04/82)

From:     Fred Blonder <fred.umcp-cs@UDel-Relay>
Date:     30 Nov 82 18:41:58 EST  (Tue)
From: Charles F. Von Rospach <CHUQUI at MIT-MC>

	I have a program which seems to have been written for the PDP11.
	. . . It seems that whoever wrote the program is doing some funny
	things in a large integer array, reading and writing into it with
	a character pointer.  . . . It seems as though it is assuming that
	an integer is 16bits . . .

	chuck (chuqui at mit-mc)

This may not work in your case, but I've found when transporting PDP-11
programs to a VAX, the cc command line argument ``-Dint=short'' useful.

mo@Lbl-Unix@sri-unix (12/04/82)

From: mo at Lbl-Unix (Mike O'Dell [system])
Date: 30 Nov 1982 23:22:18-PST
In general, C assumes (char *) is the fattest pointer, or rather,
that (char *) is a type to which other pointers (and some 
useful range of ints) can always be cast without information
loss.  Even on very strange machines this is true.
By very strange, I mean word-addressed wonders with odd numbers
of chars/word, or machines with 16 or 18-bit int pointers and
19-bit char pointers.  There are C compilers for both of these
machines which work quite nicely with this restriction.

On the other hand, larger data objects (like doubles) may require
larger alignment, so (char *) is usually the "most resolved" pointer
(takes the most number of bits to distinguish different objects)
and (double *) is likely the most boundary-aligned.  These are not
hard and fast rules, but are true in all the C compilers I have seen.

DMR, please correct any mispeaks.

	-Mike

zrm.mit-ccc@Mit-Mc@sri-unix (12/05/82)

Date: 1 Dec 1982 00:58:20-EST
I would like to know on what sort of machine you can't put a pointer
into an int. Not being able to do that raises all sorts of problems such
as not being able to subscript for large arrays or offsets, having the
size of function arguments and return values NOT be the sizeof(int),
having strange things happen in registers (ints should also have the
same size as the registers you'll most often put them in), especially if
your int suddenly becomes a long in a register, or if a (char *) gets
truncated because it was run through an integer register.

I guess what it all boils down to is that all pointers, plus ints, should
be of the same size, since these are the objects you are going to want
to put into interchangable slots the most often. Since you have the
datatypes long and short, you can also provide the programmer convenient
ways of accessing longer or shorter types of int-like objects. But if
someone has done this some other way, I'd like to hear about it.

Cheers,
Zig

mark (12/06/82)

re:
	I would like to know on what sort of machine you can't put a
	pointer into an int. Not being able to do that raises all sorts
	of problems such as not being able to subscript for large
	arrays or offsets, having the size of function arguments and
	return values NOT be the sizeof(int), having strange things
	happen in registers (ints should also have the same size as the
	registers you'll most often put them in),
I quote from page 34 of the C book:
	int will normally reflet the most "natural" size for a particular
	machine.  ...  About all you should count on is that short is no
	longer than long.
It doesn't make any promises that pointers fit in ints.  While I recognize
that there are programs out there that assume you can (probably the most
famous is the execl(2) system call, which uses an integer zero for
termination instead of a null character pointer), these should probably be
viewed as unportabilities in the specific programs.  In answer to your
questions:

The 68000 has 32 bit pointers, and many people feel that the most natural
size for an int is 16 bits.  (There are other people who feel that there
is so much software that assumes pointers and ints are the same size that's
it's worth making ints be 32 bits - both flavors of compiler seem to exist.)

I fail to see the problems with the constructs you raise:
	not being able to subscript for large arrays or offsets,
p[i] is defined as *(p+i).  A pointer plus an int yields a pointer,
that is, 32 bits + 16 bits yields 32 bits.  Where's the problem?
	having the size of function arguments and return values
	NOT be the sizeof(int),
That's true for longs and doubles and structures now.  If you're
expecting a pointer you should declare that fact in your code.
If you're reimplementing printf you should use <varargs.h>.
	having strange things happen in registers
Nothing really strange happens when you put a short or char into a
long register, (except for sign extension when you didn't want it)
on a VAX, why should a 16 bit int by any different?  In fact, you're
only going to store the least significant 16 bits from the register
so it shouldn't matter that more bits were calculated and discarded.

dan@Bbn-Unix@sri-unix (12/09/82)

From: Dan Franklin <dan@Bbn-Unix>
Date:  3 Dec 1982 14:56:38 EST (Friday)
Why do you want to be able to cast *anything* to int? I've
written several good-sized portable programs (for PDP-11s,
VAXs, and C70s) without feeling a real need to do this.
It certainly isn't portable, since an arbitrary architecture
might well make it more efficient to choose ints to be smaller
than pointers.

zrm.mit-ccc@Mit-Mc@sri-unix (12/09/82)

Date: 5 Dec 1982 14:58:45-EST
Why should ints be as big as pointers? Not to write portable code. To do
that the most commonly used hack is to use defined types to ensure that
certain variables have sufficient resolution. The reason you want
anything to fit into an int is that the compiler wants to minimise the
contortions it would otherwise need to do to pass pointers to functions.
On a pdp11 you have to go through some amount of hair to get longs and
doubles in and out of functions. You would not want that to happen for
every object larger than an int, especiallly pointers. See how simple
things are on a VAX where ints, pointers, and longs are all the same
size. In short, ints should be the same size as pointers not so much for
the programmer's sake, but for the compiler implementer's. That C
compilers do not try to "overcome" machine peculiarities is one of the
advantages of C.

There is an artifact in Unix that I do not knoe exactly how to
interpret, but sems to indicate that, at one time, C could not pass
objects larger than int to a function, or return them, or both. The
function calls that deal with the date and time take a pointer to a
long, rather than a long directly as an argument. I have not been
hacking Unix for long enough -- is there someone out there who has been
hacking it since v5 or so who might know why this peculiarity exists?
Was it C in general or just Unix system calls that lose (or lost) in
this way.

Cheers,
Zig

Michael.Young@CMU-CS-A@sri-unix (12/09/82)

From: Michael Wayne Young <Michael.Young@CMU-CS-A>
Date:  4 December 1982 1926-EST (Saturday)
I fully disagree that "pointers, plus ints, should be of the same size"...
at least if you mean we should be able to assume so for writing
machine-independent code.  I'd like the C language to NOT define
the size of pointers, ints, or anything else.  About the only
restrictions I'd like to have placed is that a
	long is bigger than a normal int, (or same size),
	short is shorter (or same size),
	and an int is at least capable of taking a character.

Mixing int's and pointers is just plain machine-dependent (and 
probably not lint-free without some clever casting).

[I'd still be interested in an answer to Doug Gwyn's question:
are there architectures for which pointers to characters aren't
the smallest pointers?  Also, I would be interested in seeing
the architecture for which a pointer is larger than an int,
but I'm not about to claim (or base my code on the fact) that
no such machine exists.]

Let's not go about making unnecessary (or at least limitedly
valuable) assumptions about our machine architectures while
we code...

			Michael

mo@Lbl-Unix@sri-unix (12/10/82)

From: mo at Lbl-Unix (Mike O'Dell [system])
Date: 6 Dec 1982 10:46:49-PST
Assuming ints and pointers are the same size produces all the
cruft which was so carefully removed going from v6 to v7.  Keep
in mind that on some machines, particuarly word-addressed,
pointers to different things a CAN BE DIFFERENT SIZES!  A pointer
to an int might be a natural address, while a pointer to a char
might need 1 to 4 additional bits to resolve a char within a word.
You CANNOT assume anything about sizes; that's what unions are for.
I completely agree that universal types are useful; that is what
(char *) can almost always be used for pointers.  On MOST machines,
A union of a (char *) and a (double) would be the biggest
"natural" object, while a union of (char *) and (int) or (long)
would probably do most of the time.

	-Mike

perry.gatech@Udel-Relay@sri-unix (12/10/82)

From:     Perry Flinn <perry.gatech@Udel-Relay>
Date:     6 Dec 82 10:58:42-EST (Mon)
The following is quoted (without permission) from Kernighan and Ritchie
(sec. 14.4, pp. 210):

	A pointer may be converted to any of the integral types large
	enough to hold it.  Whether an int or long is required is
	machine dependent, but is intended to be unsurprising to those
	who know the addressing structure of the machine. ...

	An object of integral type may be explicitly converted to a
	pointer.  The mapping always carries an integer converted from
	a pointer back to the same pointer, but is otherwise machine
	dependent.

	A pointer to one type may be converted to a pointer to another
	type.  The resulting pointer may cause addressing exceptions
	upon use if the subject pointer does not refer to an object
	suitably aligned in storage.  It is guaranteed that a pointer
	to an object of a given size may be converted to a pointer to
	an object of smaller size and back again without change.

--Perry

jim (12/10/82)

I haven't seen this posted here yet, although I'm surprised no one has
mentioned it.

On the Intel 8086 (as in IBM PC) an int is 16 bits, and pointers are 32
bits.  The 32 bits is broken down into a 16 bit offset and a 16 bit segment,
and the total address space is only 20 bits, but you need to use two
registers to store a pointer.  The main problem is that offsets wrap around,
so incrementing a pointer (using the hardware instructions to do so) does
not always give you the address of the next memory location.

In practice I haven't had much trouble with this in porting C code to the
8086 (although there are lots of other things that do cause trouble).

gwyn@Brl@sri-unix (12/13/82)

From:     Doug Gwyn <gwyn@Brl>
Date:     9 Dec 82 13:11:11-EST (Thu)
"long" ints were added shortly after 6th Edition UNIX.  To get a
long int before then, one had to have an array of 2 ints.  The
6th Ed. kernel was full of this artifice.

zrm (12/13/82)

Aw C'mon, people. Its and pointers should be of the same size to keep
the compiler implementer sane, and has relativly little to do with code
portability. Most machines, even machines that split their arithmetic
and address registers (like the 68000) have one uniform size for
registers. Because most C compilers bring things into registers and bash
on them there, one would like the most commonplace objects to fit nicely
into these registers. Ints and pointers should be of a "natural" size.
There are enough datatypes to go around so all the other sizes can be
covered.

Code portability is achieved by guarenteeing sizes, as best as one can
in C. The most common way to do this is with defined types. In order to
bring up a program, originally on a pdp11,  on a machine with 32 bit
ints, but 16 bit shorts, you would change one line in the code from

	typedef WORD int;
to
 	typedef WORD short;

and all places where size really matters will come out the same. Instead
of flaming how about an example where it might actually be useful for
there to be 16 bit ints and 32 bit pointers?

Cheers,
Zig

barmar (12/14/82)

The Honeywell 68/DPS and DPS/8 computers, which run the Multics
time-sharing system (no, the name is NOT a hack on UNIX, but
in fact the reverse is true!), uses double-words for its pointers
when it is running segmented (most of the time), so its
pointers are 72 bits long.  The natural length for int would
be 18 bits because the accumulator is 36 bits wide (it is possible
to do double-word arithmetic using the combination of the
Accumulator and Quotient registers called the AQ, but it is
not as "natural").  These pointers are also more precise than
*charyou can address arbitrary bit boundaries.  Note that all
the actual addressing information fits in less than 36 bits;
these 72-bit pointers contain additional information, such as 
ring-number and fault tags (there are nine bits eventually left over
that Multics Maclisp uses to implement typed-pointers).
An 18-bit int would be a fine subscript, though, as 18 bits is
enough to address any word in a segment (20 bits are necessary
if you want to address any character, though).
By the way, there are places in Multics that pass around pointers
(the 36-bit "packed pointer" type I alluded to earlier) as PL/I
type "fixed bin (35)", i.e. 35-bits plus sign.  This is generally
done so that the program is callable from languages that
do not support pointers, such as Fortran and COBOL.

sdyer@Bbn-Unix@sri-unix (12/15/82)

From: Steve Dyer <sdyer@Bbn-Unix>
Date:  9 Dec 1982 16:59:52 EST (Thursday)
V5 and early V6 (pre-Phototypesetter distribution) C compilers
had no concept of a 'long' integer.  Rather, most programs used
a set of library routines, and passed int[2] objects to them.
The kernel was full of these, particularly in the manipulation
of variables representing UNIX time.

The unusual calling sequence for time(), i.e.

	long tvec;	reflects the V6			 int tvec[2];
	time(&tvec);	    construct			 time(tvec);

which was kept for compatibility with earlier C programs.  In V7,
they also redeclared it as returning a 'long' quantity, so that new
programs could use the more natural convention.

In the same sense, one could say that the calling sequence for
the newer V7 call, ftime(struct timeb *), reflects the very late
addition of structures as legal return values to the C language.
That is, they had the option of declaring ftime() to return a timeb
structure, but they didn't (probably for backward compatibility.)

/Steve Dyer