[net.unix-wizards] integer types, sys calls, and stdio

jdb@mordor.UUCP (John Bruner) (01/10/85)

Here at the S-1 Project at LLNL we are porting UNIX to our own
machine, the S-1 Mark IIA.  The hardware is capable of operating
upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have
defined the following types:

	char	= 9 bits	(S-1 quarterword)
	short	= 18 bits	(S-1 halfword)
	int	= 36 bits	(S-1 singleword)
	long	= 72 bits	(S-1 doubleword)

So far, so good.

Well, not quite.  There is a lot of confusion in UNIX source code
about the types of integers which are passed as arguments to
system calls or "stdio" routines.  Anyone who has tried to port
a program written for a VAX where long==int to a machine like the
PDP-11 is familiar with the problem.  Worse yet, the descriptions
of the system calls in chapter 2 of the UPM reflect this: in V7
"lseek" is defined as

	long lseek(fildes, offset, whence)
	long offset;

whereas in the 4.2BSD manual it is

	pos = lseek(d, offset, whence)
	int pos;
	int d, offset, whence;

I consider the 4.2BSD definition to be wrong.  My question is: should
I consider the V7 definition to be correct?

We can define our system calls to use "int" and "long" integers as
V7 does, but this means that we'll have to use 72-bit integers when
a 36-bit integer would nearly always suffice.  This seems ugly to me.
In addition, it imposes a size and speed penalty.

An alternate definition might be:

	daddr_t lseek(fildes, offset, whence)
	daddr_t offset;

where "daddr_t", defined in <sys/types.h>, is machine-dependent.

Does System V define system calls using derived types?  Will the
C environment standard define "stdio" routines using derived types?
If so, I'd like to follow those standards.

One final recourse for us would be to admit defeat, change "long"
to 36-bits, and hack in a "long long" type for 72-bit integers.  I
don't want to do this, because it means that while the definition of
integer types is machine dependent, the machine that they depend upon
is the PDP-11 or the VAX.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb@mordor.ARPA [jdb@s1-c]	(415) 422-0758
  UUCP: ...!ucbvax!dual!mordor!jdb 	...!decvax!decwrl!mordor!jdb

chris@umcp-cs.UUCP (Chris Torek) (01/11/85)

The 4.2BSD manual entry that claims that lseek returns an
integer and takes an integer argument for its offset is *wrong*!
The system call itself (in the kernel) takes the arguments

	int	fd;
	off_t	off;
	int	sbase;

(off_t is typedef'd to int in <sys/types.h>, but this is a system-
dependent file.)
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/11/85)

Definitely there are several incorrect system call and C library
interface descriptions in the 4.2BSD manual.  In those cases
where there is a corresponding UNIX System V, /usr/group, or
ANSI X3J11 specification you should use it instead.

The only "variable type" declarations used in the ANSI C library
specifications are a few instances apiece of:
	jmp_buf	va_list	size_t	onexit_ptr	time_t
On 4.2BSD time_t is necessarily a long (must be an integral type!)
and size_t is a unsigned (type of "sizeof" operator).

For the case you give as an example, the answer is:

long lseek( int fildes, long offset, int whence );

Charles L. Athey III <athey@lll-crg.ARPA> (01/11/85)

I believe that the derived types are much cleaner and overall a much
better solution.  This is the solution that was taken with the NLTSS,
Network Livermore Time Sharing System, for the CRAYs [Please no flames
about time sharing a CRAY].

(p.s. Even though at LLNL I am not connected with the S-1 project -
	there are over 8000 employees and who knows how many projects
	here)

    Chuck Athey [Intelligent Terminal (Workstation) Support Group, LLNL]
    MILNET: athey@lll-crg.arpa		(415) 422-7211
    UUCP:   ucbvax!dual!lll-crg![athey | itsg!athey]

BostonU SysMgr <root%bostonu.csnet@csnet-relay.arpa> (01/12/85)

> From: John Bruner <mordor!jdb>
>
> Here at the S-1 Project at LLNL we are porting UNIX to our own
> machine, the S-1 Mark IIA.  The hardware is capable of operating
> upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have
 defined the following types:
>
> char	= 9 bits	(S-1 quarterword)
> short	= 18 bits	(S-1 halfword)
> int	= 36 bits	(S-1 singleword)
> long	= 72 bits	(S-1 doubleword)

[questions about whether this is the best way to assign these
and portability problems]

There are really two issues here. The first is re-compiling
old code reasonably and a second on how to write new
code to best access the hardware.

It seems pretty clear that old code will be served well
if long means 36-bits as it likely assumes at most 32-bits.

The machine was built on the assumption that sometimes
people need more than 36b and are willing to pay whatever
storage/speed penalties. I presume that means either they
are using integers outside of a 2**36 range or bit-packing
data (same thing, probably not in practice the problem.)

It would be a lie to even imply that such code is at all
portable. My suggestion, therefore, is that long==int==36b
for backwards and likely forward portability in general
and a new type be created for the 72b. (I realize there would
still be portability problems, but you do the best you can.)

I personally would rather see you code it as built-in functions
to your compiler (a la FORTRAN.) This would give at least
a glimpse of opportunity to port a program from the S-1
by building a simulator. For example, define routines:

	store72(result,value)
	add72(left,right)
	sub72(left,right)

and write expressions sort of like lispisms:

	store72(lv,add72(v1,v2)) ;

etc. Declaration would be by a special name, long72 comes to
mind. Your compiler could know these routines and produce
the correct instructions (or look at the sed scripts 4.2 uses.)
I would probably then typedef long72 before re-compiling
your code to be an array of two or three longs on my VAX. This would then
pass pointers on my machine. Note that store72 could easily be
a macro for 'lv = ....'. This all reminds me of the old V6
compiler which uses things like fmod() [flt pt modulo] and
the generation of subroutine calls for 32-bit longs on a PDP11.

If you don't care about forward portability, then it doesn't really
matter, no?

	-Barry Shein, Boston University

donn@utah-cs.UUCP (Donn Seeley) (01/13/85)

[Gee, the last time this came up was only in September!  The netnews
recapitulation time continues to shrink...]

The lseek.2 manual entry in 4.2 BSD is wrong.  It has been changed
(here and at Berkeley) to describe the 'offset' argument and return
value as being type 'off_t', consistent with their use in the kernel
(as Chris points out).  The lint library entry has been changed
accordingly.

Why don't we re-post everything since September,

Donn Seeley    University of Utah CS Dept    donn@utah-cs.arpa
40 46' 6"N 111 50' 34"W    (801) 581-5668    decvax!utah-cs!donn

henry@utzoo.UUCP (Henry Spencer) (01/13/85)

> An alternate definition might be:
> 
> 	daddr_t lseek(fildes, offset, whence)
> 	daddr_t offset;
> 
> where "daddr_t", defined in <sys/types.h>, is machine-dependent.

Actually, there already is a type specifically for offsets into files:
off_t.  Unfortunately, it's not nearly as widely used as it should be.
You have a choice of "doing it right" and having a fair bit of work to
do on old programs, or giving in to practicality and using "long".

Lamentably, the current draft of the ANSI C standard uses "long" for
fseek() and ftell().
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

ken@turtlevax.UUCP (Ken Turkowski) (01/13/85)

In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes:
>Here at the S-1 Project at LLNL we are porting UNIX to our own
>machine, the S-1 Mark IIA.  The hardware is capable of operating
>upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have
>defined the following types:
>
>	char	= 9 bits	(S-1 quarterword)
>	short	= 18 bits	(S-1 halfword)
>	int	= 36 bits	(S-1 singleword)
>	long	= 72 bits	(S-1 doubleword)
 ...
>We can define our system calls to use "int" and "long" integers as
>V7 does, but this means that we'll have to use 72-bit integers when
>a 36-bit integer would nearly always suffice.  This seems ugly to me.
 ...
>One final recourse for us would be to admit defeat, change "long"
>to 36-bits, and hack in a "long long" type for 72-bit integers.  I
>don't want to do this, because it means that while the definition of
>integer types is machine dependent, the machine that they depend upon
>is the PDP-11 or the VAX.

Chars have always been 8 bits, shorts always 16, and longs always 32.
I would suggest that you keep as close to this as possible.  Int has
varied between 16 and 32 bits; hell, why not make it 64? :-)
viz,

	char	= 9 bits	(S-1 quarterword)
	short	= 18 bits	(S-1 halfword)
	long	= 36 bits	(S-1 singleword)
	int	= 72 bits	(S-1 doubleword)

-- 
Ken Turkowski @ CADLINC, Menlo Park, CA
UUCP: {amd,decwrl,nsc,seismo,spar}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

jdb@mordor.UUCP (John Bruner) (01/14/85)

I'd like to thank everyone who has responded to my inquiry about
integer types, both by posting to these newsgroups and by private
mail.

As I reread my original posting, I realized that I had been a little
obscure about my real intent.  As a former V6 & V7 PDP-11 user/system
maintainer, I'm familiar with the history of the lseek() call and
the reasons why its second argument is not an "int".  (I could tell
you about some of the BSD programs I ported to V7 and the "int"=="long"
assumptions that I had to weed out, but that's another story.)

What I really was trying to ask about was the future direction of
system call (and library routine) interface specifications: were the
machine-specific types of "short", "int", and "long" being supplanted
in these cases by derived types such as "time_t" and "off_t"?  The
derived types would allow a system implementer to choose the most
efficient, sufficiently-large, integer size for each quantity rather
than being saddled forever with decisions made for a PDP-11.

[Several people commented that the correct type for a file offset
is "off_t", not "daddr_t".  Thanks -- it was a case of putting my
keyboard into gear without engaging my brain.]

There seem to be two major points of consensus:

    1)	The use of derived types is preferable, and their use in
	new UNIX versions is increasing.  (I didn't receive any
	replies that specifically mentioned the ANSI C environment
	standard effort, but I assume that the issue is under
	consideration there as well.)  However, there is too much
	code out there which explictly uses "int" and "long" to
	change things quickly.

    2)	The qualifiers "short" and "long" are now pretty generally
	considered to refer to quantities that are roughly 16 and
	32 bits long, respectively.  They can be defined to be
	larger, but this usually just results in a space and
	execution-time penalty.

I'd like to solve our problem with the S-1 Mark IIA by defining the
appropriate derived types as "int"s and making all of our C programs
use the derived types.  Unfortunately, the required conversion effort
would be phenomenal, and we'd have to check all imported programs for
explicit "long" declarations.  This leaves us with two choices.
We will either use the old definitions and suffer the performance
penalty for 72-bit integers where 36-bit ones would suffice
(something I'm not very happy to accept), or we'll redefine "long"
to be the same as "int" and we'll introduce "long long" (something
our C compiler person is reluctant to do).

[BTW, "long long" would have been very useful on the VAX.  It's
a pain to write a cross-assembler for a 36-bit machine when the
largest available integer type is 32 bits wide.]

In any event, I appreciate the help that everyone has offered.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb@mordor.ARPA [jdb@s1-c]	(415) 422-0758
  UUCP: ...!ucbvax!dual!mordor!jdb 	...!decvax!decwrl!mordor!jdb

jon@cit-vax (Jonathan P. Leech) (01/14/85)

    In article <631@turtlevax.UUCP>, Ken Turkowski <turtlevax!ken> writes:
> Chars have always been 8 bits, shorts always 16, and longs always 32.
> I would suggest that you keep as close to this as possible.  Int has
> varied between 16 and 32 bits; hell, why not make it 64? :-)
> viz,
>
>	  char	  = 9 bits	  (S-1 quarterword)
>	  short   = 18 bits	  (S-1 halfword)
>	  long	  = 36 bits	  (S-1 singleword)
>	  int	  = 72 bits	  (S-1 doubleword)
>
> --
> Ken Turkowski @ CADLINC, Menlo Park, CA

    Why not? Perhaps Appendix A, section 4 (p. 182) of K&R:
	"Up to three sizes of integer, declared short  int,  int,  and
	long int, are available.   Longer  integers  provide  no  less
	storage than shorter ones."
    Also, if you apply the type conversion rules in  section  6.6,  an
	operation involving a (36 bit) long and a (72  bit)  int  will
	have result type of long, losing precision.
    Does anyone know what the ANSI standard says about this?

    Jon Leech
    jon@cit-vax.arpa

Ron Natalie <ron@BRL-TGR> (01/15/85)

Yes we had the same problem with the denelcor HEP.

It has an inate 64 bit wordsize which you'd like to be an int.  Long is
also equal to int since the machine doesn't bother with anything great
than 64 bits of integer (and since long cant be smaller than int).  We
picked short to be 16 bits for compatibility and defined (gack) a mystery
type called _int32 for people who really want the 32 bit things.

_int32 is frequently typedef'd to "medium."  I wanted to call the short longs.

With eight byte ints you can make the optimzation that
	uid = username ^ 'root\0\0\0\0'

Eh?

-Ron

Speaking of ANSI standard sizes...there Fortran compiler had to be reworked
because according to the spec, DOUBLE must occcupy EXACTLY twice
as much space as REAL.  There is also a hard relationship between INTEGER
and REAL.  Yuch.

-Ron

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/15/85)

> _int32 is frequently typedef'd to "medium."  I wanted to call the[m] short longs.

Not long shorts, or better yet, Bermuda shorts?  Clamdiggers?

ed@mtxinu.UUCP (Ed Gould) (01/18/85)

> Here at the S-1 Project at LLNL we are porting UNIX to our own
> machine, the S-1 Mark IIA.  The hardware is capable of operating
> upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have
> defined the following types:
> 
> 	char	= 9 bits	(S-1 quarterword)
> 	short	= 18 bits	(S-1 halfword)
> 	int	= 36 bits	(S-1 singleword)
> 	long	= 72 bits	(S-1 doubleword)
> 
> So far, so good.
>
> ...
>
> One final recourse for us would be to admit defeat, change "long"
> to 36-bits, and hack in a "long long" type for 72-bit integers.  I
> don't want to do this, because it means that while the definition of
> integer types is machine dependent, the machine that they depend upon
> is the PDP-11 or the VAX.

Please DO!  The definition of C calls for *two* lengths of integers:
"short int" and "long int".  "int" alone may be defined as one or the
other.  Actually, the other approach you could take is "short short int"
for the 18-bit ones, using "short int" for 36 and "long int" for 72.
This is what the folks at Amdahl finally did with their UTS port.
I tried porting stuff from the VAX to their pre-release that had
16-bit shorts, 32-bit ints, and 64-bit longs and wound up
#defining long to int, but that didn't work either.


>                                                           in V7
> "lseek" is defined as
> 
> 	long lseek(fildes, offset, whence)
> 	long offset;
> 
> whereas in the 4.2BSD manual it is
> 
> 	pos = lseek(d, offset, whence)
> 	int pos;
> 	int d, offset, whence;
> 
> I consider the 4.2BSD definition to be wrong.  My question is: should
> I consider the V7 definition to be correct?
> 
> We can define our system calls to use "int" and "long" integers as
> V7 does, but this means that we'll have to use 72-bit integers when
> a 36-bit integer would nearly always suffice.  This seems ugly to me.
> In addition, it imposes a size and speed penalty.
> 
> An alternate definition might be:
> 
> 	daddr_t lseek(fildes, offset, whence)
> 	daddr_t offset;
> 
> where "daddr_t", defined in <sys/types.h>, is machine-dependent.

The issue of how to name system-dependant types (e.g., daddr_t) is
separate from the types defined by the compiler.  You're right that
4.2 defines them wrong - they should have used daddr_t and off_t.
It's ususlly clear what the right width is; it should be named in
the most portable way possible.  Usually this means using either
long or short, never int.  Int is supposed to be the "natural"
length for the machine, and presumably it's the fastest form.
It should be used when the width of the item isn't too important
and where, for portability, 16 bits is *always* enough.
(Actually, I think the definition should be

	off_t lseek(...)
	off_t offset;

using offset, not disk addresses.)

-- 
Ed Gould		    mt Xinu, 739 Allston Way, Berkeley, CA  94710  USA
{ucbvax,decvax}!mtxinu!ed   +1 415 644 0146

guy@rlgvax.UUCP (Guy Harris) (01/21/85)

> > One final recourse for us would be to admit defeat, change "long"
> > to 36-bits, and hack in a "long long" type for 72-bit integers.
> 
> Please DO!  The definition of C calls for *two* lengths of integers:
> "short int" and "long int".  "int" alone may be defined as one or the
> other.

Actually, the C reference manual calls for up to *three* lengths of integers:
"short int", "int", and "long int"; any of them may be equivalent to the
other as long as sizeof(short int) <= sizeof(int) <= sizeof(long int).
In practice, it may cause problems if "int" isn't the same as any of the other
two, but that's because of historical practice, not because of the
"specification" of the language.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

seifert@mako.UUCP (Snoopy) (01/30/85)

In article <631@turtlevax.UUCP> ken@turtlevax.UUCP (Ken Turkowski) writes:
>In article <1997@mordor.UUCP> jdb@mordor.UUCP (John Bruner) writes:
>>Here at the S-1 Project at LLNL we are porting UNIX to our own
>>machine, the S-1 Mark IIA.  The hardware is capable of operating
>>upon 9-bit, 18-bit, 36-bit, and 72-bit quantities, so we have
>>defined the following types:
>>
>>	char	= 9 bits	(S-1 quarterword)
>>	short	= 18 bits	(S-1 halfword)
>>	int	= 36 bits	(S-1 singleword)
>>	long	= 72 bits	(S-1 doubleword)
> ...
>>We can define our system calls to use "int" and "long" integers as
>>V7 does, but this means that we'll have to use 72-bit integers when
>>a 36-bit integer would nearly always suffice.  This seems ugly to me.
> ...
>
>Chars have always been 8 bits, shorts always 16, and longs always 32.

I wouldn't bet my life on it.  Look in K&R. (page 182 in my copy)

>I would suggest that you keep as close to this as possible.  Int has
>varied between 16 and 32 bits; hell, why not make it 64? :-)
>viz,
>
>	char	= 9 bits	(S-1 quarterword)
>	short	= 18 bits	(S-1 halfword)
>	long	= 36 bits	(S-1 singleword)
>	int	= 72 bits	(S-1 doubleword)

"int" longer than "long"?  You are hereby sentenced to program
in FORTRASH for six months!

-----------  new (?) idea begins here  --------------------

OK, here's my suggestion, which may not help John (Hi John!)
port existing code, but might help in the future.

Why not figure out how many bits each variable *needs*, and then
declare them accordingly:

int8	foo;
int16	bar;
int12	baz;
int9	buff[BUFFSIZ];
int18	blat;

Then when the mess gets compiled, the various size ints get changed
to the smallest possible machine entity size, on the machine
it's getting compiled for.

So then when you're porting some spiffo program developed on,
say, a pdp11 to say, a pdp8, with 12 bit words, "baz" fits nicely.
You don't have to *assume* that it really might need 16 bits.

"blat", which is just a tad too big for a 16 bit int, would fit
in a "short" on the S-1 Mark IIA.  It it had been declared "long",
it would get 72 bits! (are we talking overkill, or what?)

(I *refuse* to use the CDC with it's 6 bit characters and no lower case
as an example, so there!)

If speed is more important than storage, one could use a "fast"
suffix:

int14f	foobar;

This would clue the compiler/preprocessor/sed-script/whatever to
use the fastest size (if it fits), even if it fits in something smaller.
or, there's always:

register int14 foobar;

That's it.  Pretty simple.  (therefore it might work, but noone will
like it  :-)  )  Again, yes I realise this doesn't help with *existing*
code, but it would help to use it in new stuff, no?  And it doesn't
require an extension to the language!

        _____
	|___|		the Bavarian Beagle
       _|___|_			Snoopy
       \_____/		tektronix!mako!seifert
        \___/

ndiamond@watdaisy.UUCP (Norman Diamond) (01/30/85)

> -----------  new (?) idea begins here  --------------------
> 
> OK, here's my suggestion, which may not help John (Hi John!)
> port existing code, but might help in the future.
> 
> Why not figure out how many bits each variable *needs*, and then
> declare them accordingly:
> int8	foo;
> int16	bar;
> int12	baz;
> int9	buff[BUFFSIZ];
> int18	blat;
>             the Bavarian Beagle    Snoopy    tektronix!mako!seifert

That's as new as PL/I is, anyway.  One of the things Pascal did right
was define subranges using lower and upper actual bounds, instead of
number of bits ... or in other words, the bounds didn't have to be
(some power of 2, minus 1).

-- Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."