jdb@mordor.UUCP (John Bruner) (01/10/85)
Here at the S-1 Project at LLNL we are porting UNIX to our own machine, the S-1 Mark IIA. The hardware is capable of operating upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have defined the following types: char = 9 bits (S-1 quarterword) short = 18 bits (S-1 halfword) int = 36 bits (S-1 singleword) long = 72 bits (S-1 doubleword) So far, so good. Well, not quite. There is a lot of confusion in UNIX source code about the types of integers which are passed as arguments to system calls or "stdio" routines. Anyone who has tried to port a program written for a VAX where long==int to a machine like the PDP-11 is familiar with the problem. Worse yet, the descriptions of the system calls in chapter 2 of the UPM reflect this: in V7 "lseek" is defined as long lseek(fildes, offset, whence) long offset; whereas in the 4.2BSD manual it is pos = lseek(d, offset, whence) int pos; int d, offset, whence; I consider the 4.2BSD definition to be wrong. My question is: should I consider the V7 definition to be correct? We can define our system calls to use "int" and "long" integers as V7 does, but this means that we'll have to use 72-bit integers when a 36-bit integer would nearly always suffice. This seems ugly to me. In addition, it imposes a size and speed penalty. An alternate definition might be: daddr_t lseek(fildes, offset, whence) daddr_t offset; where "daddr_t", defined in <sys/types.h>, is machine-dependent. Does System V define system calls using derived types? Will the C environment standard define "stdio" routines using derived types? If so, I'd like to follow those standards. One final recourse for us would be to admit defeat, change "long" to 36-bits, and hack in a "long long" type for 72-bit integers. I don't want to do this, because it means that while the definition of integer types is machine dependent, the machine that they depend upon is the PDP-11 or the VAX. -- John Bruner (S-1 Project, Lawrence Livermore National Laboratory) MILNET: jdb@mordor.ARPA [jdb@s1-c] (415) 422-0758 UUCP: ...!ucbvax!dual!mordor!jdb ...!decvax!decwrl!mordor!jdb
chris@umcp-cs.UUCP (Chris Torek) (01/11/85)
The 4.2BSD manual entry that claims that lseek returns an
integer and takes an integer argument for its offset is *wrong*!
The system call itself (in the kernel) takes the arguments
	int	fd;
	off_t	off;
	int	sbase;
(off_t is typedef'd to int in <sys/types.h>, but this is a system-
dependent file.)
-- 
(This line accidently left nonblank.)
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@marylandDoug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/11/85)
Definitely there are several incorrect system call and C library interface descriptions in the 4.2BSD manual. In those cases where there is a corresponding UNIX System V, /usr/group, or ANSI X3J11 specification you should use it instead. The only "variable type" declarations used in the ANSI C library specifications are a few instances apiece of: jmp_buf va_list size_t onexit_ptr time_t On 4.2BSD time_t is necessarily a long (must be an integral type!) and size_t is a unsigned (type of "sizeof" operator). For the case you give as an example, the answer is: long lseek( int fildes, long offset, int whence );
Charles L. Athey III <athey@lll-crg.ARPA> (01/11/85)
I believe that the derived types are much cleaner and overall a much
better solution.  This is the solution that was taken with the NLTSS,
Network Livermore Time Sharing System, for the CRAYs [Please no flames
about time sharing a CRAY].
(p.s. Even though at LLNL I am not connected with the S-1 project -
	there are over 8000 employees and who knows how many projects
	here)
    Chuck Athey [Intelligent Terminal (Workstation) Support Group, LLNL]
    MILNET: athey@lll-crg.arpa		(415) 422-7211
    UUCP:   ucbvax!dual!lll-crg![athey | itsg!athey]BostonU SysMgr <root%bostonu.csnet@csnet-relay.arpa> (01/12/85)
> From: John Bruner <mordor!jdb> > > Here at the S-1 Project at LLNL we are porting UNIX to our own > machine, the S-1 Mark IIA. The hardware is capable of operating > upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have defined the following types: > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > int = 36 bits (S-1 singleword) > long = 72 bits (S-1 doubleword) [questions about whether this is the best way to assign these and portability problems] There are really two issues here. The first is re-compiling old code reasonably and a second on how to write new code to best access the hardware. It seems pretty clear that old code will be served well if long means 36-bits as it likely assumes at most 32-bits. The machine was built on the assumption that sometimes people need more than 36b and are willing to pay whatever storage/speed penalties. I presume that means either they are using integers outside of a 2**36 range or bit-packing data (same thing, probably not in practice the problem.) It would be a lie to even imply that such code is at all portable. My suggestion, therefore, is that long==int==36b for backwards and likely forward portability in general and a new type be created for the 72b. (I realize there would still be portability problems, but you do the best you can.) I personally would rather see you code it as built-in functions to your compiler (a la FORTRAN.) This would give at least a glimpse of opportunity to port a program from the S-1 by building a simulator. For example, define routines: store72(result,value) add72(left,right) sub72(left,right) and write expressions sort of like lispisms: store72(lv,add72(v1,v2)) ; etc. Declaration would be by a special name, long72 comes to mind. Your compiler could know these routines and produce the correct instructions (or look at the sed scripts 4.2 uses.) I would probably then typedef long72 before re-compiling your code to be an array of two or three longs on my VAX. This would then pass pointers on my machine. Note that store72 could easily be a macro for 'lv = ....'. This all reminds me of the old V6 compiler which uses things like fmod() [flt pt modulo] and the generation of subroutine calls for 32-bit longs on a PDP11. If you don't care about forward portability, then it doesn't really matter, no? -Barry Shein, Boston University
donn@utah-cs.UUCP (Donn Seeley) (01/13/85)
[Gee, the last time this came up was only in September! The netnews recapitulation time continues to shrink...] The lseek.2 manual entry in 4.2 BSD is wrong. It has been changed (here and at Berkeley) to describe the 'offset' argument and return value as being type 'off_t', consistent with their use in the kernel (as Chris points out). The lint library entry has been changed accordingly. Why don't we re-post everything since September, Donn Seeley University of Utah CS Dept donn@utah-cs.arpa 40 46' 6"N 111 50' 34"W (801) 581-5668 decvax!utah-cs!donn
henry@utzoo.UUCP (Henry Spencer) (01/13/85)
> An alternate definition might be: > > daddr_t lseek(fildes, offset, whence) > daddr_t offset; > > where "daddr_t", defined in <sys/types.h>, is machine-dependent. Actually, there already is a type specifically for offsets into files: off_t. Unfortunately, it's not nearly as widely used as it should be. You have a choice of "doing it right" and having a fair bit of work to do on old programs, or giving in to practicality and using "long". Lamentably, the current draft of the ANSI C standard uses "long" for fseek() and ftell(). -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
ken@turtlevax.UUCP (Ken Turkowski) (01/13/85)
In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes: >Here at the S-1 Project at LLNL we are porting UNIX to our own >machine, the S-1 Mark IIA. The hardware is capable of operating >upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have >defined the following types: > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > int = 36 bits (S-1 singleword) > long = 72 bits (S-1 doubleword) ... >We can define our system calls to use "int" and "long" integers as >V7 does, but this means that we'll have to use 72-bit integers when >a 36-bit integer would nearly always suffice. This seems ugly to me. ... >One final recourse for us would be to admit defeat, change "long" >to 36-bits, and hack in a "long long" type for 72-bit integers. I >don't want to do this, because it means that while the definition of >integer types is machine dependent, the machine that they depend upon >is the PDP-11 or the VAX. Chars have always been 8 bits, shorts always 16, and longs always 32. I would suggest that you keep as close to this as possible. Int has varied between 16 and 32 bits; hell, why not make it 64? :-) viz, char = 9 bits (S-1 quarterword) short = 18 bits (S-1 halfword) long = 36 bits (S-1 singleword) int = 72 bits (S-1 doubleword) -- Ken Turkowski @ CADLINC, Menlo Park, CA UUCP: {amd,decwrl,nsc,seismo,spar}!turtlevax!ken ARPA: turtlevax!ken@DECWRL.ARPA
jdb@mordor.UUCP (John Bruner) (01/14/85)
I'd like to thank everyone who has responded to my inquiry about
integer types, both by posting to these newsgroups and by private
mail.
As I reread my original posting, I realized that I had been a little
obscure about my real intent.  As a former V6 & V7 PDP-11 user/system
maintainer, I'm familiar with the history of the lseek() call and
the reasons why its second argument is not an "int".  (I could tell
you about some of the BSD programs I ported to V7 and the "int"=="long"
assumptions that I had to weed out, but that's another story.)
What I really was trying to ask about was the future direction of
system call (and library routine) interface specifications: were the
machine-specific types of "short", "int", and "long" being supplanted
in these cases by derived types such as "time_t" and "off_t"?  The
derived types would allow a system implementer to choose the most
efficient, sufficiently-large, integer size for each quantity rather
than being saddled forever with decisions made for a PDP-11.
[Several people commented that the correct type for a file offset
is "off_t", not "daddr_t".  Thanks -- it was a case of putting my
keyboard into gear without engaging my brain.]
There seem to be two major points of consensus:
    1)	The use of derived types is preferable, and their use in
	new UNIX versions is increasing.  (I didn't receive any
	replies that specifically mentioned the ANSI C environment
	standard effort, but I assume that the issue is under
	consideration there as well.)  However, there is too much
	code out there which explictly uses "int" and "long" to
	change things quickly.
    2)	The qualifiers "short" and "long" are now pretty generally
	considered to refer to quantities that are roughly 16 and
	32 bits long, respectively.  They can be defined to be
	larger, but this usually just results in a space and
	execution-time penalty.
I'd like to solve our problem with the S-1 Mark IIA by defining the
appropriate derived types as "int"s and making all of our C programs
use the derived types.  Unfortunately, the required conversion effort
would be phenomenal, and we'd have to check all imported programs for
explicit "long" declarations.  This leaves us with two choices.
We will either use the old definitions and suffer the performance
penalty for 72-bit integers where 36-bit ones would suffice
(something I'm not very happy to accept), or we'll redefine "long"
to be the same as "int" and we'll introduce "long long" (something
our C compiler person is reluctant to do).
[BTW, "long long" would have been very useful on the VAX.  It's
a pain to write a cross-assembler for a 36-bit machine when the
largest available integer type is 32 bits wide.]
In any event, I appreciate the help that everyone has offered.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb@mordor.ARPA [jdb@s1-c]	(415) 422-0758
  UUCP: ...!ucbvax!dual!mordor!jdb 	...!decvax!decwrl!mordor!jdbjon@cit-vax (Jonathan P. Leech) (01/14/85)
In article <631@turtlevax.UUCP>, Ken Turkowski <turtlevax!ken> writes: > Chars have always been 8 bits, shorts always 16, and longs always 32. > I would suggest that you keep as close to this as possible. Int has > varied between 16 and 32 bits; hell, why not make it 64? :-) > viz, > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > long = 36 bits (S-1 singleword) > int = 72 bits (S-1 doubleword) > > -- > Ken Turkowski @ CADLINC, Menlo Park, CA Why not? Perhaps Appendix A, section 4 (p. 182) of K&R: "Up to three sizes of integer, declared short int, int, and long int, are available. Longer integers provide no less storage than shorter ones." Also, if you apply the type conversion rules in section 6.6, an operation involving a (36 bit) long and a (72 bit) int will have result type of long, losing precision. Does anyone know what the ANSI standard says about this? Jon Leech jon@cit-vax.arpa
Ron Natalie <ron@BRL-TGR> (01/15/85)
Yes we had the same problem with the denelcor HEP. It has an inate 64 bit wordsize which you'd like to be an int. Long is also equal to int since the machine doesn't bother with anything great than 64 bits of integer (and since long cant be smaller than int). We picked short to be 16 bits for compatibility and defined (gack) a mystery type called _int32 for people who really want the 32 bit things. _int32 is frequently typedef'd to "medium." I wanted to call the short longs. With eight byte ints you can make the optimzation that uid = username ^ 'root\0\0\0\0' Eh? -Ron Speaking of ANSI standard sizes...there Fortran compiler had to be reworked because according to the spec, DOUBLE must occcupy EXACTLY twice as much space as REAL. There is also a hard relationship between INTEGER and REAL. Yuch. -Ron
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/15/85)
> _int32 is frequently typedef'd to "medium."  I wanted to call the[m] short longs.
Not long shorts, or better yet, Bermuda shorts?  Clamdiggers?ed@mtxinu.UUCP (Ed Gould) (01/18/85)
> Here at the S-1 Project at LLNL we are porting UNIX to our own > machine, the S-1 Mark IIA. The hardware is capable of operating > upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have > defined the following types: > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > int = 36 bits (S-1 singleword) > long = 72 bits (S-1 doubleword) > > So far, so good. > > ... > > One final recourse for us would be to admit defeat, change "long" > to 36-bits, and hack in a "long long" type for 72-bit integers. I > don't want to do this, because it means that while the definition of > integer types is machine dependent, the machine that they depend upon > is the PDP-11 or the VAX. Please DO! The definition of C calls for *two* lengths of integers: "short int" and "long int". "int" alone may be defined as one or the other. Actually, the other approach you could take is "short short int" for the 18-bit ones, using "short int" for 36 and "long int" for 72. This is what the folks at Amdahl finally did with their UTS port. I tried porting stuff from the VAX to their pre-release that had 16-bit shorts, 32-bit ints, and 64-bit longs and wound up #defining long to int, but that didn't work either. > in V7 > "lseek" is defined as > > long lseek(fildes, offset, whence) > long offset; > > whereas in the 4.2BSD manual it is > > pos = lseek(d, offset, whence) > int pos; > int d, offset, whence; > > I consider the 4.2BSD definition to be wrong. My question is: should > I consider the V7 definition to be correct? > > We can define our system calls to use "int" and "long" integers as > V7 does, but this means that we'll have to use 72-bit integers when > a 36-bit integer would nearly always suffice. This seems ugly to me. > In addition, it imposes a size and speed penalty. > > An alternate definition might be: > > daddr_t lseek(fildes, offset, whence) > daddr_t offset; > > where "daddr_t", defined in <sys/types.h>, is machine-dependent. The issue of how to name system-dependant types (e.g., daddr_t) is separate from the types defined by the compiler. You're right that 4.2 defines them wrong - they should have used daddr_t and off_t. It's ususlly clear what the right width is; it should be named in the most portable way possible. Usually this means using either long or short, never int. Int is supposed to be the "natural" length for the machine, and presumably it's the fastest form. It should be used when the width of the item isn't too important and where, for portability, 16 bits is *always* enough. (Actually, I think the definition should be off_t lseek(...) off_t offset; using offset, not disk addresses.) -- Ed Gould mt Xinu, 739 Allston Way, Berkeley, CA 94710 USA {ucbvax,decvax}!mtxinu!ed +1 415 644 0146
guy@rlgvax.UUCP (Guy Harris) (01/21/85)
> > One final recourse for us would be to admit defeat, change "long" > > to 36-bits, and hack in a "long long" type for 72-bit integers. > > Please DO! The definition of C calls for *two* lengths of integers: > "short int" and "long int". "int" alone may be defined as one or the > other. Actually, the C reference manual calls for up to *three* lengths of integers: "short int", "int", and "long int"; any of them may be equivalent to the other as long as sizeof(short int) <= sizeof(int) <= sizeof(long int). In practice, it may cause problems if "int" isn't the same as any of the other two, but that's because of historical practice, not because of the "specification" of the language. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
seifert@mako.UUCP (Snoopy) (01/30/85)
In article <631@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes: >In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes: >>Here at the S-1 Project at LLNL we are porting UNIX to our own >>machine, the S-1 Mark IIA. The hardware is capable of operating >>upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have >>defined the following types: >> >> char = 9 bits (S-1 quarterword) >> short = 18 bits (S-1 halfword) >> int = 36 bits (S-1 singleword) >> long = 72 bits (S-1 doubleword) > ... >>We can define our system calls to use "int" and "long" integers as >>V7 does, but this means that we'll have to use 72-bit integers when >>a 36-bit integer would nearly always suffice. This seems ugly to me. > ... > >Chars have always been 8 bits, shorts always 16, and longs always 32. I wouldn't bet my life on it. Look in K&R. (page 182 in my copy) >I would suggest that you keep as close to this as possible. Int has >varied between 16 and 32 bits; hell, why not make it 64? :-) >viz, > > char = 9 bits (S-1 quarterword) > short = 18 bits (S-1 halfword) > long = 36 bits (S-1 singleword) > int = 72 bits (S-1 doubleword) "int" longer than "long"? You are hereby sentenced to program in FORTRASH for six months! ----------- new (?) idea begins here -------------------- OK, here's my suggestion, which may not help John (Hi John!) port existing code, but might help in the future. Why not figure out how many bits each variable *needs*, and then declare them accordingly: int8 foo; int16 bar; int12 baz; int9 buff[BUFFSIZ]; int18 blat; Then when the mess gets compiled, the various size ints get changed to the smallest possible machine entity size, on the machine it's getting compiled for. So then when you're porting some spiffo program developed on, say, a pdp11 to say, a pdp8, with 12 bit words, "baz" fits nicely. You don't have to *assume* that it really might need 16 bits. "blat", which is just a tad too big for a 16 bit int, would fit in a "short" on the S-1 Mark IIA. It it had been declared "long", it would get 72 bits! (are we talking overkill, or what?) (I *refuse* to use the CDC with it's 6 bit characters and no lower case as an example, so there!) If speed is more important than storage, one could use a "fast" suffix: int14f foobar; This would clue the compiler/preprocessor/sed-script/whatever to use the fastest size (if it fits), even if it fits in something smaller. or, there's always: register int14 foobar; That's it. Pretty simple. (therefore it might work, but noone will like it :-) ) Again, yes I realise this doesn't help with *existing* code, but it would help to use it in new stuff, no? And it doesn't require an extension to the language! _____ |___| the Bavarian Beagle _|___|_ Snoopy \_____/ tektronix!mako!seifert \___/
ndiamond@watdaisy.UUCP (Norman Diamond) (01/30/85)
> ----------- new (?) idea begins here -------------------- > > OK, here's my suggestion, which may not help John (Hi John!) > port existing code, but might help in the future. > > Why not figure out how many bits each variable *needs*, and then > declare them accordingly: > int8 foo; > int16 bar; > int12 baz; > int9 buff[BUFFSIZ]; > int18 blat; > the Bavarian Beagle Snoopy tektronix!mako!seifert That's as new as PL/I is, anyway. One of the things Pascal did right was define subranges using lower and upper actual bounds, instead of number of bits ... or in other words, the bounds didn't have to be (some power of 2, minus 1). -- Norman Diamond UUCP: {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond CSNET: ndiamond%watdaisy@waterloo.csnet ARPA: ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa "Opinions are those of the keyboard, and do not reflect on me or higher-ups."