jdb@mordor.UUCP (John Bruner) (01/10/85)
Here at the S-1 Project at LLNL we are porting UNIX to our own machine, the S-1 Mark IIA. The hardware is capable of operating upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have defined the following types: char = 9 bits (S-1 quarterword) short = 18 bits (S-1 halfword) int = 36 bits (S-1 singleword) long = 72 bits (S-1 doubleword) So far, so good. Well, not quite. There is a lot of confusion in UNIX source code about the types of integers which are passed as arguments to system calls or "stdio" routines. Anyone who has tried to port a program written for a VAX where long==int to a machine like the PDP-11 is familiar with the problem. Worse yet, the descriptions of the system calls in chapter 2 of the UPM reflect this: in V7 "lseek" is defined as long lseek(fildes, offset, whence) long offset; whereas in the 4.2BSD manual it is pos = lseek(d, offset, whence) int pos; int d, offset, whence; I consider the 4.2BSD definition to be wrong. My question is: should I consider the V7 definition to be correct? We can define our system calls to use "int" and "long" integers as V7 does, but this means that we'll have to use 72-bit integers when a 36-bit integer would nearly always suffice. This seems ugly to me. In addition, it imposes a size and speed penalty. An alternate definition might be: daddr_t lseek(fildes, offset, whence) daddr_t offset; where "daddr_t", defined in <sys/types.h>, is machine-dependent. Does System V define system calls using derived types? Will the C environment standard define "stdio" routines using derived types? If so, I'd like to follow those standards. One final recourse for us would be to admit defeat, change "long" to 36-bits, and hack in a "long long" type for 72-bit integers. I don't want to do this, because it means that while the definition of integer types is machine dependent, the machine that they depend upon is the PDP-11 or the VAX. -- John Bruner (S-1 Project, Lawrence Livermore National Laboratory) MILNET: jdb@mordor.ARPA [jdb@s1-c] (415) 422-0758 UUCP: ...!ucbvax!dual!mordor!jdb ...!decvax!decwrl!mordor!jdb
chris@umcp-cs.UUCP (Chris Torek) (01/11/85)
The 4.2BSD manual entry that claims that lseek returns an integer and takes an integer argument for its offset is *wrong*! The system call itself (in the kernel) takes the arguments int fd; off_t off; int sbase; (off_t is typedef'd to int in <sys/types.h>, but this is a system- dependent file.) -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
henry@utzoo.UUCP (Henry Spencer) (01/13/85)
> An alternate definition might be: > > daddr_t lseek(fildes, offset, whence) > daddr_t offset; > > where "daddr_t", defined in <sys/types.h>, is machine-dependent. Actually, there already is a type specifically for offsets into files: off_t. Unfortunately, it's not nearly as widely used as it should be. You have a choice of "doing it right" and having a fair bit of work to do on old programs, or giving in to practicality and using "long". Lamentably, the current draft of the ANSI C standard uses "long" for fseek() and ftell(). -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
ken@turtlevax.UUCP (Ken Turkowski) (01/13/85)
In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes: >Here at the S-1 Project at LLNL we are porting UNIX to our own >machine, the S-1 Mark IIA. The hardware is capable of operating >upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have >defined the following types: > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > int = 36 bits (S-1 singleword) > long = 72 bits (S-1 doubleword) ... >We can define our system calls to use "int" and "long" integers as >V7 does, but this means that we'll have to use 72-bit integers when >a 36-bit integer would nearly always suffice. This seems ugly to me. ... >One final recourse for us would be to admit defeat, change "long" >to 36-bits, and hack in a "long long" type for 72-bit integers. I >don't want to do this, because it means that while the definition of >integer types is machine dependent, the machine that they depend upon >is the PDP-11 or the VAX. Chars have always been 8 bits, shorts always 16, and longs always 32. I would suggest that you keep as close to this as possible. Int has varied between 16 and 32 bits; hell, why not make it 64? :-) viz, char = 9 bits (S-1 quarterword) short = 18 bits (S-1 halfword) long = 36 bits (S-1 singleword) int = 72 bits (S-1 doubleword) -- Ken Turkowski @ CADLINC, Menlo Park, CA UUCP: {amd,decwrl,nsc,seismo,spar}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
jdb@mordor.UUCP (John Bruner) (01/14/85)
I'd like to thank everyone who has responded to my inquiry about integer types, both by posting to these newsgroups and by private mail. As I reread my original posting, I realized that I had been a little obscure about my real intent. As a former V6 & V7 PDP-11 user/system maintainer, I'm familiar with the history of the lseek() call and the reasons why its second argument is not an "int". (I could tell you about some of the BSD programs I ported to V7 and the "int"=="long" assumptions that I had to weed out, but that's another story.) What I really was trying to ask about was the future direction of system call (and library routine) interface specifications: were the machine-specific types of "short", "int", and "long" being supplanted in these cases by derived types such as "time_t" and "off_t"? The derived types would allow a system implementer to choose the most efficient, sufficiently-large, integer size for each quantity rather than being saddled forever with decisions made for a PDP-11. [Several people commented that the correct type for a file offset is "off_t", not "daddr_t". Thanks -- it was a case of putting my keyboard into gear without engaging my brain.] There seem to be two major points of consensus: 1) The use of derived types is preferable, and their use in new UNIX versions is increasing. (I didn't receive any replies that specifically mentioned the ANSI C environment standard effort, but I assume that the issue is under consideration there as well.) However, there is too much code out there which explictly uses "int" and "long" to change things quickly. 2) The qualifiers "short" and "long" are now pretty generally considered to refer to quantities that are roughly 16 and 32 bits long, respectively. They can be defined to be larger, but this usually just results in a space and execution-time penalty. I'd like to solve our problem with the S-1 Mark IIA by defining the appropriate derived types as "int"s and making all of our C programs use the derived types. Unfortunately, the required conversion effort would be phenomenal, and we'd have to check all imported programs for explicit "long" declarations. This leaves us with two choices. We will either use the old definitions and suffer the performance penalty for 72-bit integers where 36-bit ones would suffice (something I'm not very happy to accept), or we'll redefine "long" to be the same as "int" and we'll introduce "long long" (something our C compiler person is reluctant to do). [BTW, "long long" would have been very useful on the VAX. It's a pain to write a cross-assembler for a 36-bit machine when the largest available integer type is 32 bits wide.] In any event, I appreciate the help that everyone has offered. -- John Bruner (S-1 Project, Lawrence Livermore National Laboratory) MILNET: jdb@mordor.ARPA [jdb@s1-c] (415) 422-0758 UUCP: ...!ucbvax!dual!mordor!jdb ...!decvax!decwrl!mordor!jdb
ed@mtxinu.UUCP (Ed Gould) (01/18/85)
> Here at the S-1 Project at LLNL we are porting UNIX to our own > machine, the S-1 Mark IIA. The hardware is capable of operating > upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have > defined the following types: > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > int = 36 bits (S-1 singleword) > long = 72 bits (S-1 doubleword) > > So far, so good. > > ... > > One final recourse for us would be to admit defeat, change "long" > to 36-bits, and hack in a "long long" type for 72-bit integers. I > don't want to do this, because it means that while the definition of > integer types is machine dependent, the machine that they depend upon > is the PDP-11 or the VAX. Please DO! The definition of C calls for *two* lengths of integers: "short int" and "long int". "int" alone may be defined as one or the other. Actually, the other approach you could take is "short short int" for the 18-bit ones, using "short int" for 36 and "long int" for 72. This is what the folks at Amdahl finally did with their UTS port. I tried porting stuff from the VAX to their pre-release that had 16-bit shorts, 32-bit ints, and 64-bit longs and wound up #defining long to int, but that didn't work either. > in V7 > "lseek" is defined as > > long lseek(fildes, offset, whence) > long offset; > > whereas in the 4.2BSD manual it is > > pos = lseek(d, offset, whence) > int pos; > int d, offset, whence; > > I consider the 4.2BSD definition to be wrong. My question is: should > I consider the V7 definition to be correct? > > We can define our system calls to use "int" and "long" integers as > V7 does, but this means that we'll have to use 72-bit integers when > a 36-bit integer would nearly always suffice. This seems ugly to me. > In addition, it imposes a size and speed penalty. > > An alternate definition might be: > > daddr_t lseek(fildes, offset, whence) > daddr_t offset; > > where "daddr_t", defined in <sys/types.h>, is machine-dependent. The issue of how to name system-dependant types (e.g., daddr_t) is separate from the types defined by the compiler. You're right that 4.2 defines them wrong - they should have used daddr_t and off_t. It's ususlly clear what the right width is; it should be named in the most portable way possible. Usually this means using either long or short, never int. Int is supposed to be the "natural" length for the machine, and presumably it's the fastest form. It should be used when the width of the item isn't too important and where, for portability, 16 bits is *always* enough. (Actually, I think the definition should be off_t lseek(...) off_t offset; using offset, not disk addresses.) -- Ed Gould mt Xinu, 739 Allston Way, Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146
guy@rlgvax.UUCP (Guy Harris) (01/21/85)
> > One final recourse for us would be to admit defeat, change "long" > > to 36-bits, and hack in a "long long" type for 72-bit integers. > > Please DO! The definition of C calls for *two* lengths of integers: > "short int" and "long int". "int" alone may be defined as one or the > other. Actually, the C reference manual calls for up to *three* lengths of integers: "short int", "int", and "long int"; any of them may be equivalent to the other as long as sizeof(short int) <= sizeof(int) <= sizeof(long int). In practice, it may cause problems if "int" isn't the same as any of the other two, but that's because of historical practice, not because of the "specification" of the language. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
seifert@mako.UUCP (Snoopy) (01/30/85)
In article <631@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes: >>Here at the S-1 Project at LLNL we are porting UNIX to our own >>machine, the S-1 Mark IIA. The hardware is capable of operating >>upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have >>defined the following types: >> >> char = 9 bits (S-1 quarterword) >> short = 18 bits (S-1 halfword) >> int = 36 bits (S-1 singleword) >> long = 72 bits (S-1 doubleword) > ... >>We can define our system calls to use "int" and "long" integers as >>V7 does, but this means that we'll have to use 72-bit integers when >>a 36-bit integer would nearly always suffice. This seems ugly to me. > ... > >Chars have always been 8 bits, shorts always 16, and longs always 32. I wouldn't bet my life on it. Look in K&R. (page 182 in my copy) >I would suggest that you keep as close to this as possible. Int has >varied between 16 and 32 bits; hell, why not make it 64? :-) >viz, > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > long = 36 bits (S-1 singleword) > int = 72 bits (S-1 doubleword) "int" longer than "long"? You are hereby sentenced to program in FORTRASH for six months! ----------- new (?) idea begins here -------------------- OK, here's my suggestion, which may not help John (Hi John!) port existing code, but might help in the future. Why not figure out how many bits each variable *needs*, and then declare them accordingly: int8 foo; int16 bar; int12 baz; int9 buff[BUFFSIZ]; int18 blat; Then when the mess gets compiled, the various size ints get changed to the smallest possible machine entity size, on the machine it's getting compiled for. So then when you're porting some spiffo program developed on, say, a pdp11 to say, a pdp8, with 12 bit words, "baz" fits nicely. You don't have to *assume* that it really might need 16 bits. "blat", which is just a tad too big for a 16 bit int, would fit in a "short" on the S-1 Mark IIA. It it had been declared "long", it would get 72 bits! (are we talking overkill, or what?) (I *refuse* to use the CDC with it's 6 bit characters and no lower case as an example, so there!) If speed is more important than storage, one could use a "fast" suffix: int14f foobar; This would clue the compiler/preprocessor/sed-script/whatever to use the fastest size (if it fits), even if it fits in something smaller. or, there's always: register int14 foobar; That's it. Pretty simple. (therefore it might work, but noone will like it :-) ) Again, yes I realise this doesn't help with *existing* code, but it would help to use it in new stuff, no? And it doesn't require an extension to the language! _____ |___| the Bavarian Beagle _|___|_ Snoopy \_____/ tektronix!mako!seifert \___/
ndiamond@watdaisy.UUCP (Norman Diamond) (01/30/85)
> ----------- new (?) idea begins here -------------------- > > OK, here's my suggestion, which may not help John (Hi John!) > port existing code, but might help in the future. > > Why not figure out how many bits each variable *needs*, and then > declare them accordingly: > int8 foo; > int16 bar; > int12 baz; > int9 buff[BUFFSIZ]; > int18 blat; > the Bavarian Beagle Snoopy tektronix!mako!seifert That's as new as PL/I is, anyway. One of the things Pascal did right was define subranges using lower and upper actual bounds, instead of number of bits ... or in other words, the bounds didn't have to be (some power of 2, minus 1). -- Norman Diamond UUCP: {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond CSNET: ndiamond%watdaisy@waterloo.csnet ARPA: ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa "Opinions are those of the keyboard, and do not reflect on me or higher-ups."
jdb@mordor.UUCP (John Bruner) (01/30/85)
Things have quieted down quite a bit since I asked my initial question, and I should be smart enough to leave things alone, but I guess I'm not. We gave up and implemented sizeof(int) == sizeof(long) with integers 36 bits wide, basically because we didn't want to have to convert the overwhelming mass of existing programs. Snoopy raises a point which I'd like to expand upon -- the idea of defining derived types "int8", "int9", "int16" which can be redefined when a program is moved from one machine to another. I had been doing some thinking about writing programs for maximum portability and how the language might be changed to encourage more portable programs. Here are some of my thoughts on this issue. By way of introduction, I am not a C novice. I learned C back in 1977 on a PDP-11/70 V6 UNIX system. I have used it to program on PDP-11's, VAXes, various Motorola 68K systems, and now our local machine (the S-1 Mark IIA). The programs have included user- and kernel-mode UNIX code, among other things. Most C users are blessed with a machine architecture that resembles a PDP-11 in several important ways: (1) it is a two's complement machine, (2) it is byte-addressible (where larger data types are some power-of-two number of bytes long), (3) it has an 8-bit byte, (4) it operates most conveniently on primitive data types which are 16 and 32 bits long, (4) it is not a tagged architecture, (5) memory is not segmented, but is allocated in one contiguous block (or perhaps two or three if you count text/data-bss/stack). Another characteristic which rears its head from time to time (although less often than the others, thanks to the popularity of the MC68XXX) is (6) bytes are ordered in a "little endian" fashion. Writing truly portable code in C does not come naturally. As we have discovered here in our efforts to port C and UNIX to the S-1, a lot of programs break when the machine that they run on does not satisfy one of the assumptions I noted above. For a mild example, consider the byte-ordering problem and how it shows up in programs such as "talk" (to name one example at random). Here at the S-1 Project we have two operating systems projects underway. The other operating system, Amber, is written in a language called Pastel (a "colorful" Pascal). Pastel has been significantly extended relative to standard Pascal, so that it supports separate compilation (by "modules", each of which may contain public and private parts), pointer manipulation, flexible argument passing to procedures and functions (i.e. varying number and type of arguments), good access to low-level machine instructions (MUCH better than the kludgey "asm" in C), and it produces excellent code. From time to time I have occasion to program in Pastel. While I prefer C, and I often find the Pascal-based syntax a little clumsy, I definitely miss a few of Pastel's features when I program in C. (I'll come back to a specific example below.) C is used to achieve two different ends. It is used to code machine-dependent routines (e.g. device drivers), and it is used to write machine-independent programs. Unfortunately, I fear that too much of its machine-dependent flavor carries over into programs that are supposed to be machine independent. The assumptions that I listed above are continually invoked, so that the resulting program won't go (at least, not easily) to another machine. Anyone who has tried to port programs written for the VAX (with implicit "int" == "long" assumptions) to machines like the PDP-11 knows what I mean. Having laid forth all of this philosophy, let me give one specific case and expand upon Snoopy's suggestion. I believe that C should provide some means of defining integer data types in terms of the range of values that the type represents, rather than the machine-dependent size of the storage cell that the type will occupy. The compiler can pick the correct storage size. Then "short" and "long" would be reserved for machine-dependent cases, and machines with larger word sizes can be easily accomodated. Why should the programmer have to worry about whether his value can fit in a "short" or whether a "long" will be necessary? I'm not familiar with Concurrent Euclid (perhaps I should look it up), but subrange types are an important central concept in Pascal, Modula, and Ada. Please note that I am not proposing any new features for the ANSI standardization effort. I'm expressing thoughts about future directions for C. (I don't recall seeing subranges in the C++ paper in the BSTJ [oops, BLTJ].) I'm not proposing to turn C into Pascal. Contrary to some of the sentiments expressed in this group, however, I do feel that C can benefit from an examination of languages like Pascal. Finally, let me hedge my way back toward the conservative camp and pose a question that should be asked in parallel with "what features does C need?" How can we raise the standards of C programmers (possibly without *any* language changes) so that the programs they write will be more portable? If we don't have explicit subranges, how do we encourage programmers to define and use things like "int8"? Other portability considerations should include standardized derived types, libraries, an understanding of pointers and integers (and why (int)0 is not the same thing as (int *)0), and other implications of the variety of machine architectures that C runs on. [BTW, a VAX Pastel compiler is available through the ARPA/MILNET by the anonymous account "ftp", file "pastel.bintape". This file is in "tar" format. If you don't have ARPANET access, you can contact Christine Ghinazzi, S-1 Project, Lawrence Livermore National Laboratory PO Box 5503 Livermore, CA 94550 for information on obtaining a tape copy. There is no charge.] -- John Bruner (S-1 Project, Lawrence Livermore National Laboratory) MILNET: jdb@mordor.ARPA [jdb@s1-c] (415) 422-0758 UUCP: ...!ucbvax!dual!mordor!jdb ...!decvax!decwrl!mordor!jdb
friesen@psivax.UUCP (Stanley Friesen) (02/01/85)
Whitesmith has come up with a series of defined types which can be used to increase portibility if they are used regularly. They are defined in an include file "std.h" which can be adjusted to each machines oddities. The types, and intended meanings are: LONG a 32 bit signed integer(quantity) ULONG a 32 bit unsigned quantity LBITS a 32 bit set(i.e to be used as a set of individual bits) COUNT a 16 bit signed quantity UCOUNT a 16 bit unsigned quantity BITS 16 individual bits TEXT a byte for holding actual characters TINY an 8 bit signed quantity UTINY an 8 bit unsigned quantity TBITS 8 individual bits METACH an extended byte to hold augmented characters ARGINT an indefinate integer type for parameters VOID generic function type BYTES unsigned quatity sufficient to hold a 'sizeof' result FILE a quanttiy to hold file numbers FIO a structure for buffered I/O calls The only problem with this is the incompatibility with stdio caused by the last items, especially since the FIO type is not conformable with the stdio FILE type. I have there for modified the std.h that I use to make FIO the equivalent of stdio FILE, thus allowing me to use that package. -- Sarima (Stanley Friesen) {trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen or quad1!psivax!friesen
mwm@ucbtopaz.CC.Berkeley.ARPA (02/08/85)
In article <294@psivax.UUCP> friesen@psivax.UUCP (Stanley friesen) writes: > Whitesmith has come up with a series of defined types >which can be used to increase portibility if they are used regularly. >They are defined in an include file "std.h" which can be adjusted >to each machines oddities. The types, and intended meanings are: >[list deleted - mwm] Stanley, this list is similar (in intent, anyway) to /usr/include/sys/types.h. types.h has the advantage that it doesn't tie you to some specific architecture. DRI pushes similar things with their compilers, that define "BYTE" (8 bits), "WORD" (2 bytes) and "LONG" (4 bytes). My basic reaction to all of these is the same: YUCH. Not that these things are bad, just that they aren't sufficient if you're really worried about it (for instance, how does the 7-bit chars used for TOPS-10 ASCII fit? Better yet, how about their 6-bit chars used in some places?). The correct solution to making sure that you have enough space for all your int's has already been suggested: A machine-specific series of typedefs of the form: typedef <integer type> sintX ; typedef <unsigned integer type> uintX ; Where X is the number of bits of magnitude that you need, sint is a smallest integer signed type with that many bits of magnitude or more, and uint is the analogous unsigned integer type (no, we will *not* worry about trinary machines and other such oddities. Yet.) Note that for an 8-bit architechture where the compiler makes "char" signed and does the least suprising thing with "unsigned char", "typedef char uint8;" works, but you have to have "typedef short sint8" to get 8 bits of magnitude out of the thing. Trouble is, one system (4BSD, or AT&T even) introducing that include file doesn't help much. Old programs would still not use it, and new programs would require you to create the include file of typedefs. Having been to lazy to do that, I use the folloing rules: Most of the time, If you're not worried about how big integer types are, use int. God help you if your int's are only 8 bits long (I don't think anyone has done that). If you need more than 16 bits, use "long". Never use char as something to hold an integer - use short for very small objects where you're worried about space. While not perfect, it doesn't create problems on screwy machines like the whitesmith/DRI plan, and comes close to working almost everywhere. Now, if only I could convince the compiler to do runtime overflow checking. <mike
bsa@ncoast.UUCP (Brandon Allbery) (02/11/85)
> Article <22@mordor.UUCP>, from jdb@mordor.UUCP (John Bruner) +---------------- | [BTW, a VAX Pastel compiler is available through the ARPA/MILNET by | the anonymous account "ftp", file "pastel.bintape". This file is | in "tar" format. If you don't have ARPANET access, you can contact | | ... | | for information on obtaining a tape copy. There is no charge.] Speaking of portability... I would be crazy to expect a VAX compiler to run on my machine, that's not the flame. What *is*, is that so many programs are offered for free or fairly cheap, *on tape*. But not all Unix boxes, especially 68000-based ones, have tape drives. And I don't look to Radio Shack offering a standard tape drive in the future. How do *we* get the PD stuff like MH, which was touted as runnable under Xenix? (This is only my third flame so far; if I keep it up, maybe someone'll answer.) $40 is NO problem; tape distribution *is*. Brandon (bsa@ncoast.UUCP) -- Brandon Allbery, decvax!cwruecmp!ncoast!bsa, "ncoast!bsa"@case.csnet (etc.) 6504 Chestnut Road, Independence, Ohio 44131 +1 216 524 1416 (or what have you)