CHUQUI@MIT-MC@sri-unix (11/28/82)
From: Charles F. Von Rospach <CHUQUI at MIT-MC> Date: 27 November 1982 00:18-EST Here is a (probably) very obvious question. I hope there is an equally obvious answer: Is there some place where the obvious differences between the published information for BSD4.1 and how things really are on a Vax machine can be found? I am especially interested in machine differences between the Vax and the PDP-11, which is what most of the material (including the white book) are written for. What I am most interested in are things like: The width of the various variable definitions (long, int, short, etc) on the Vax. What variable types can be used for register variables, and the number of register variables available for use at any one time. Any internal storage changes between PDP11 C and Vax C. Any other machine compatibility problems that there are. What brings this up is an interesting problem: I have a program which seems to have been written for the PDP11. In bringing it up on the Vax, I have been getting funny results. It seems that whoever wrote the program is doing some funny things in a large integer array, reading and writing into it with a character pointer. Since I assume the program did work (its problems are much too obvious to have let through), I am assuming that there is some problem with how the thing handles internal storage. It seems as though it is assuming that an integer is 16bits, stored in two bytes in a low/high byte setup. IF I remember my PDP11, that is how it stores things. I am rather new to that Vax, but I believe it stores things high/low. I am right now wondering if there are any other time bombs out there that I (and the rest of the net) need to be aware of. As an aside, the program flows through lint without a single mutter. I have not taken the program apart as yet, but a cursory look at the code shows me that it seems to be doing some real strange things with pointers. I would have hoped that lint would have caught this, but evidently not. IF you know of a general source for the differences between machines (either Vax/PDP11 or any of the Unix machines in general), please let me know. If you have a particular problem, pass it along and I can summarize for the net. chuck (chuqui at mit-mc)
zrm (11/28/82)
Well, it would take too long to list all the possible machine dependencies possible with C compiler instantiations, but perhaps some general philosophy of C might help. o In general, you want to be able to cast *anything* to int. This means an int ought to be the same size as a pointer. Berkley, and Whitesmith's do this correctly in their compilers, Alcyon does it wrong. o Muddling about inside larger objects should be done with unions. However, in the program CHUQUI mentioned, the author seems to have been ingenious enough to scuttle all portability -- he probably assumes ints hold exactly two bytes. Note that using unions also insures tha byte ordering within a larger word won't matter from machine to machine (assuming the compiler isn't broken). There is also a rather rude hack you can use where you declare and array struct { char hibyte; char lobyte; } and use the members of that structure to reference parts of a two-byte object with the syntax WORD foo; /* WORD is a defined type, another very useful tool */ high = foo.hibyte; /* BLETCH! */ but this is really horrid style and would rather use RPGII than actually maintain code like that. o A fix that might work for CHUQUI is using defined types. But you have to be very careful on several counts. > If you substitute, say, WORD for int, on a wholesale basis, you have to make sure no pointers are being put in ints. > All parameters passed to functions are of size sizeof(int). Treat parameters essentially as you would registers. All this is making me queasy. There are all sorts of other nasty little things you can trip over on your way from one machine (or compiler) to the next and the ony way to verify portability is to port something and fix it when it breaks. Indigestion! Zig
ark (11/29/82)
It is not true that pointers must be the same size as ints. It IS true that a pointer should be capable of being cast without loss of information to a sufficiently long int. The key words are "sufficiently long". There are machines out there with 16 bit integers and 32 bit pointers.
zrm (11/29/82)
You really do want ints to be as big as pointers, and, in fact, only as big as pointers. The main reason I can think of off the top of my head is that you want to be able to subscript a pointers over the whole address space that that pointer is in. On 32 bit machines this means 2^32 bytes. It would be very difficult to rejig pointer arithmetic and its interface to normal arithmetic so that different size objects could be used in each. The one compiler that I have worked with that has 16 bit ints and 32 bit pointers has grotesquely broken pointer arithmetic. Also, C has enough different datatypes so as to accomodate both the "natural" sizes for ints and pointers across different machines and let the programmer insure that objects that have to be of a certain size are that size. I use defined types to maintain portability from pdp11s to 68000s. Specifically, if I need a 32 bit quantity I define a type LONGWORD. On the 11 and the 68000 it looks like this: typedef LONGWORD long; But if I need a 16 bit quantity, and need to use it in portable code, the typedefs go like this: #ifdef PDP11 typedef WORD int; #else #ifdef 68000 typedef WORD short; This also localizes the program's machine dependence. And speaking of C compilers, does anyone know of a good compiler for the PDP10? Cheers, Zig
gwyn@Brl@sri-unix (12/01/82)
From: Doug Gwyn <gwyn@Brl> Date: 29 Nov 82 13:37:50-EST (Mon) I have used machines on which a (char *) cannot possibly fit into an (int). I am curious whether anyone has ever found a case in which (char *) is not the "fattest" pointer type. I generally assume it is, so I would like to hear if this isn't always true. Thanks...
fred.umcp-cs@UDel-Relay@sri-unix (12/04/82)
From: Fred Blonder <fred.umcp-cs@UDel-Relay> Date: 30 Nov 82 18:41:58 EST (Tue) From: Charles F. Von Rospach <CHUQUI at MIT-MC> I have a program which seems to have been written for the PDP11. . . . It seems that whoever wrote the program is doing some funny things in a large integer array, reading and writing into it with a character pointer. . . . It seems as though it is assuming that an integer is 16bits . . . chuck (chuqui at mit-mc) This may not work in your case, but I've found when transporting PDP-11 programs to a VAX, the cc command line argument ``-Dint=short'' useful.
mo@Lbl-Unix@sri-unix (12/04/82)
From: mo at Lbl-Unix (Mike O'Dell [system]) Date: 30 Nov 1982 23:22:18-PST In general, C assumes (char *) is the fattest pointer, or rather, that (char *) is a type to which other pointers (and some useful range of ints) can always be cast without information loss. Even on very strange machines this is true. By very strange, I mean word-addressed wonders with odd numbers of chars/word, or machines with 16 or 18-bit int pointers and 19-bit char pointers. There are C compilers for both of these machines which work quite nicely with this restriction. On the other hand, larger data objects (like doubles) may require larger alignment, so (char *) is usually the "most resolved" pointer (takes the most number of bits to distinguish different objects) and (double *) is likely the most boundary-aligned. These are not hard and fast rules, but are true in all the C compilers I have seen. DMR, please correct any mispeaks. -Mike
zrm.mit-ccc@Mit-Mc@sri-unix (12/05/82)
Date: 1 Dec 1982 00:58:20-EST I would like to know on what sort of machine you can't put a pointer into an int. Not being able to do that raises all sorts of problems such as not being able to subscript for large arrays or offsets, having the size of function arguments and return values NOT be the sizeof(int), having strange things happen in registers (ints should also have the same size as the registers you'll most often put them in), especially if your int suddenly becomes a long in a register, or if a (char *) gets truncated because it was run through an integer register. I guess what it all boils down to is that all pointers, plus ints, should be of the same size, since these are the objects you are going to want to put into interchangable slots the most often. Since you have the datatypes long and short, you can also provide the programmer convenient ways of accessing longer or shorter types of int-like objects. But if someone has done this some other way, I'd like to hear about it. Cheers, Zig
mark (12/06/82)
re: I would like to know on what sort of machine you can't put a pointer into an int. Not being able to do that raises all sorts of problems such as not being able to subscript for large arrays or offsets, having the size of function arguments and return values NOT be the sizeof(int), having strange things happen in registers (ints should also have the same size as the registers you'll most often put them in), I quote from page 34 of the C book: int will normally reflet the most "natural" size for a particular machine. ... About all you should count on is that short is no longer than long. It doesn't make any promises that pointers fit in ints. While I recognize that there are programs out there that assume you can (probably the most famous is the execl(2) system call, which uses an integer zero for termination instead of a null character pointer), these should probably be viewed as unportabilities in the specific programs. In answer to your questions: The 68000 has 32 bit pointers, and many people feel that the most natural size for an int is 16 bits. (There are other people who feel that there is so much software that assumes pointers and ints are the same size that's it's worth making ints be 32 bits - both flavors of compiler seem to exist.) I fail to see the problems with the constructs you raise: not being able to subscript for large arrays or offsets, p[i] is defined as *(p+i). A pointer plus an int yields a pointer, that is, 32 bits + 16 bits yields 32 bits. Where's the problem? having the size of function arguments and return values NOT be the sizeof(int), That's true for longs and doubles and structures now. If you're expecting a pointer you should declare that fact in your code. If you're reimplementing printf you should use <varargs.h>. having strange things happen in registers Nothing really strange happens when you put a short or char into a long register, (except for sign extension when you didn't want it) on a VAX, why should a 16 bit int by any different? In fact, you're only going to store the least significant 16 bits from the register so it shouldn't matter that more bits were calculated and discarded.
dan@Bbn-Unix@sri-unix (12/09/82)
From: Dan Franklin <dan@Bbn-Unix> Date: 3 Dec 1982 14:56:38 EST (Friday) Why do you want to be able to cast *anything* to int? I've written several good-sized portable programs (for PDP-11s, VAXs, and C70s) without feeling a real need to do this. It certainly isn't portable, since an arbitrary architecture might well make it more efficient to choose ints to be smaller than pointers.
zrm.mit-ccc@Mit-Mc@sri-unix (12/09/82)
Date: 5 Dec 1982 14:58:45-EST Why should ints be as big as pointers? Not to write portable code. To do that the most commonly used hack is to use defined types to ensure that certain variables have sufficient resolution. The reason you want anything to fit into an int is that the compiler wants to minimise the contortions it would otherwise need to do to pass pointers to functions. On a pdp11 you have to go through some amount of hair to get longs and doubles in and out of functions. You would not want that to happen for every object larger than an int, especiallly pointers. See how simple things are on a VAX where ints, pointers, and longs are all the same size. In short, ints should be the same size as pointers not so much for the programmer's sake, but for the compiler implementer's. That C compilers do not try to "overcome" machine peculiarities is one of the advantages of C. There is an artifact in Unix that I do not knoe exactly how to interpret, but sems to indicate that, at one time, C could not pass objects larger than int to a function, or return them, or both. The function calls that deal with the date and time take a pointer to a long, rather than a long directly as an argument. I have not been hacking Unix for long enough -- is there someone out there who has been hacking it since v5 or so who might know why this peculiarity exists? Was it C in general or just Unix system calls that lose (or lost) in this way. Cheers, Zig
Michael.Young@CMU-CS-A@sri-unix (12/09/82)
From: Michael Wayne Young <Michael.Young@CMU-CS-A> Date: 4 December 1982 1926-EST (Saturday) I fully disagree that "pointers, plus ints, should be of the same size"... at least if you mean we should be able to assume so for writing machine-independent code. I'd like the C language to NOT define the size of pointers, ints, or anything else. About the only restrictions I'd like to have placed is that a long is bigger than a normal int, (or same size), short is shorter (or same size), and an int is at least capable of taking a character. Mixing int's and pointers is just plain machine-dependent (and probably not lint-free without some clever casting). [I'd still be interested in an answer to Doug Gwyn's question: are there architectures for which pointers to characters aren't the smallest pointers? Also, I would be interested in seeing the architecture for which a pointer is larger than an int, but I'm not about to claim (or base my code on the fact) that no such machine exists.] Let's not go about making unnecessary (or at least limitedly valuable) assumptions about our machine architectures while we code... Michael
mo@Lbl-Unix@sri-unix (12/10/82)
From: mo at Lbl-Unix (Mike O'Dell [system]) Date: 6 Dec 1982 10:46:49-PST Assuming ints and pointers are the same size produces all the cruft which was so carefully removed going from v6 to v7. Keep in mind that on some machines, particuarly word-addressed, pointers to different things a CAN BE DIFFERENT SIZES! A pointer to an int might be a natural address, while a pointer to a char might need 1 to 4 additional bits to resolve a char within a word. You CANNOT assume anything about sizes; that's what unions are for. I completely agree that universal types are useful; that is what (char *) can almost always be used for pointers. On MOST machines, A union of a (char *) and a (double) would be the biggest "natural" object, while a union of (char *) and (int) or (long) would probably do most of the time. -Mike
perry.gatech@Udel-Relay@sri-unix (12/10/82)
From: Perry Flinn <perry.gatech@Udel-Relay> Date: 6 Dec 82 10:58:42-EST (Mon) The following is quoted (without permission) from Kernighan and Ritchie (sec. 14.4, pp. 210): A pointer may be converted to any of the integral types large enough to hold it. Whether an int or long is required is machine dependent, but is intended to be unsurprising to those who know the addressing structure of the machine. ... An object of integral type may be explicitly converted to a pointer. The mapping always carries an integer converted from a pointer back to the same pointer, but is otherwise machine dependent. A pointer to one type may be converted to a pointer to another type. The resulting pointer may cause addressing exceptions upon use if the subject pointer does not refer to an object suitably aligned in storage. It is guaranteed that a pointer to an object of a given size may be converted to a pointer to an object of smaller size and back again without change. --Perry
jim (12/10/82)
I haven't seen this posted here yet, although I'm surprised no one has mentioned it. On the Intel 8086 (as in IBM PC) an int is 16 bits, and pointers are 32 bits. The 32 bits is broken down into a 16 bit offset and a 16 bit segment, and the total address space is only 20 bits, but you need to use two registers to store a pointer. The main problem is that offsets wrap around, so incrementing a pointer (using the hardware instructions to do so) does not always give you the address of the next memory location. In practice I haven't had much trouble with this in porting C code to the 8086 (although there are lots of other things that do cause trouble).
gwyn@Brl@sri-unix (12/13/82)
From: Doug Gwyn <gwyn@Brl> Date: 9 Dec 82 13:11:11-EST (Thu) "long" ints were added shortly after 6th Edition UNIX. To get a long int before then, one had to have an array of 2 ints. The 6th Ed. kernel was full of this artifice.
zrm (12/13/82)
Aw C'mon, people. Its and pointers should be of the same size to keep the compiler implementer sane, and has relativly little to do with code portability. Most machines, even machines that split their arithmetic and address registers (like the 68000) have one uniform size for registers. Because most C compilers bring things into registers and bash on them there, one would like the most commonplace objects to fit nicely into these registers. Ints and pointers should be of a "natural" size. There are enough datatypes to go around so all the other sizes can be covered. Code portability is achieved by guarenteeing sizes, as best as one can in C. The most common way to do this is with defined types. In order to bring up a program, originally on a pdp11, on a machine with 32 bit ints, but 16 bit shorts, you would change one line in the code from typedef WORD int; to typedef WORD short; and all places where size really matters will come out the same. Instead of flaming how about an example where it might actually be useful for there to be 16 bit ints and 32 bit pointers? Cheers, Zig
barmar (12/14/82)
The Honeywell 68/DPS and DPS/8 computers, which run the Multics time-sharing system (no, the name is NOT a hack on UNIX, but in fact the reverse is true!), uses double-words for its pointers when it is running segmented (most of the time), so its pointers are 72 bits long. The natural length for int would be 18 bits because the accumulator is 36 bits wide (it is possible to do double-word arithmetic using the combination of the Accumulator and Quotient registers called the AQ, but it is not as "natural"). These pointers are also more precise than *charyou can address arbitrary bit boundaries. Note that all the actual addressing information fits in less than 36 bits; these 72-bit pointers contain additional information, such as ring-number and fault tags (there are nine bits eventually left over that Multics Maclisp uses to implement typed-pointers). An 18-bit int would be a fine subscript, though, as 18 bits is enough to address any word in a segment (20 bits are necessary if you want to address any character, though). By the way, there are places in Multics that pass around pointers (the 36-bit "packed pointer" type I alluded to earlier) as PL/I type "fixed bin (35)", i.e. 35-bits plus sign. This is generally done so that the program is callable from languages that do not support pointers, such as Fortran and COBOL.
sdyer@Bbn-Unix@sri-unix (12/15/82)
From: Steve Dyer <sdyer@Bbn-Unix> Date: 9 Dec 1982 16:59:52 EST (Thursday) V5 and early V6 (pre-Phototypesetter distribution) C compilers had no concept of a 'long' integer. Rather, most programs used a set of library routines, and passed int[2] objects to them. The kernel was full of these, particularly in the manipulation of variables representing UNIX time. The unusual calling sequence for time(), i.e. long tvec; reflects the V6 int tvec[2]; time(&tvec); construct time(tvec); which was kept for compatibility with earlier C programs. In V7, they also redeclared it as returning a 'long' quantity, so that new programs could use the more natural convention. In the same sense, one could say that the calling sequence for the newer V7 call, ftime(struct timeb *), reflects the very late addition of structures as legal return values to the C language. That is, they had the option of declaring ftime() to return a timeb structure, but they didn't (probably for backward compatibility.) /Steve Dyer