[comp.unix.wizards] bigger longs

peralta@pinocchio.Encore.COM (Rick Peralta) (02/07/90)

What are the feelings here regarding 64 bit longs?
There are applications and devces that are breaking
Gigabyte limits of the 32 bit archetecture and it
seems that we will be stuck in the too small address
space again.

What platforms are using the 64 bit ALUs?

Is anyone using 64 bit pointers (yet)?

Does anyone have arguments (applications) for using 64 bits?

A few that come to mind are:

	. just plain math resolution
	. Very large virtual memory
	. larger disk storage
	  (no joke single volumes will be breaking lseek() soon)


 - Rick
Z[H

truesdel@sun217..nas.nasa.gov (David A. Truesdell) (02/07/90)

peralta@pinocchio.Encore.COM (Rick Peralta) writes:

>Does anyone have arguments (applications) for using 64 bits?
>
>A few that come to mind are:
>
>	. larger disk storage
>	  (no joke single volumes will be breaking lseek() soon)

lseek is already "broken" here.  I'm in the process of testing a striped
filesystem which currently weighs in at 20 GigaBytes, with a production size
expected to be 200+ GB.  Our fsck needs a special "long lseek" (64 bits) to
move around.

Also, I believe the cray allows files of more than 2 GB.  It already has
64 bit longs.

T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan

peralta@pinocchio.Encore.COM (Rick Peralta) (02/08/90)

In article <4812@amelia.nas.nasa.gov> (David A. Truesdell) writes:
>peralta@pinocchio.Encore.COM (Rick Peralta) writes:
>
>>Does anyone have arguments (applications) for using 64 bits?
>>
>>A few that come to mind are:
>>
>>	. larger disk storage
>>	  (no joke single volumes will be breaking lseek() soon)
>
>lseek is already "broken" here.  I'm in the process of testing a striped
>filesystem which currently weighs in at 20 GigaBytes, with a production size
>expected to be 200+ GB.  

Have you standardized your new seek?  I was playing with the idea of
implementing bseek(fd, 64bits, whence);  It seemed kind of nasty to start
with but got quite reasonable quickly.

Inside the kernel space the address can be easily broken up into block size
(1K-8K) chunks, indexing into the device done and then the remainder of the
address can be used to setup the details of the position.  This seemed like
a nice perk for backward compatibility too.  Just have a union for each block
size and juggle away.  The application could play the same game and not be
constricted by the actual block size.

Of course things like ftell would break.  Maybe the address of the off_t
could be passed into bseek and the new offset returned in the variable
(gack!  I think I've been looking at too much AT&T source).  The return
value would then be status information.

 - Rick (Or maybe you are a floating point fan... 8^)

gwyn@smoke.BRL.MIL (Doug Gwyn) (02/08/90)

In article <11071@encore.Encore.COM> peralta@pinocchio.UUCP (Rick Peralta) writes:
>What are the feelings here regarding 64 bit longs?

This is really a C question, not a UNIX question.

64-bit long integers are just fine.  In fact in the kind of environment
you describe I'd even say they are preferred.

However, you may find that many applications "know" that longs are 32
bits.  Such applications are already broken, but market pressure may
cause you to cater to them anyway.

scott@bbxsda.UUCP (Scott Amspoker) (02/09/90)

In article <4812@amelia.nas.nasa.gov> (David A. Truesdell) writes:
>lseek is already "broken" here.  I'm in the process of testing a striped
>filesystem which currently weighs in at 20 GigaBytes, with a production size
>expected to be 200+ GB.  

Forgive my ignorance, but, what is a "striped" filesystem?

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

markh@attctc.Dallas.TX.US (Mark Harrison) (02/09/90)

writes:
>What are the feelings here regarding 64 bit longs?
 
As Unix tries to get a larger share of the commercial market, We will see
a need for storing numeric values with 18-digit precision, ala COBOL and
the IBM mainframe.  This can be accomplished in 64 bits, and is probably
the reason "they" chose 18 digits as their maximum precision.

btw, I have always heard 64 bit integers referred to as "xlongs" (extra
longs)... is this common or just our own local jargon?

Mark Harrison
(markh @ attctc)

truesdel@sun217..nas.nasa.gov (David A. Truesdell) (02/10/90)

peralta@pinocchio.Encore.COM (Rick Peralta) writes:
>Have you standardized your new seek?

It's not exactly a new seek call, it's actually implemented as an ioctl() for
the raw device.  Amdahl's UTS supports a 64bit "long long" so games don't have
to be played with arrays of smaller integers, as is done with select().

Now, if only people didn't write code which assumes a long is the same as an int
we could change the compiler to think longs were always 64bits, and a lot of
the current limits would simply vanish.


T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan

truesdel@sun217..nas.nasa.gov (David A. Truesdell) (02/10/90)

scott@bbxsda.UUCP (Scott Amspoker) writes:
>In article <4812@amelia.nas.nasa.gov> (David A. Truesdell) writes:
>>lseek is already "broken" here.  I'm in the process of testing a striped
>>filesystem which currently weighs in at 20 GigaBytes, with a production size
>>expected to be 200+ GB.  

>Forgive my ignorance, but, what is a "striped" filesystem?

A striped (or stripeing) filesystem is one in which the filesystem is spread
out over a set of disks in order to increase capacity and/or performance and/or
reliability.  The filesystem I'm testing would be classed as "level 5 RAID".
(That's "Redundant Array of Inexpensive Disks", too bad our disks can't really
be called "Inexpensive".)

You can check out the September '89 (v7i9) issue of UNIX Review, which has an
article ("Winged Memory") which covers the ideas behind RAID, and the different
classes of RAID filesystems.

T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan

truesdel@sun217..nas.nasa.gov (David A. Truesdell) (02/10/90)

markh@attctc.Dallas.TX.US (Mark Harrison) writes:
>btw, I have always heard 64 bit integers referred to as "xlongs" (extra
>longs)... is this common or just our own local jargon?

Uts calls a 64bit integer a "long long".  On a Cray, it is simply a "long".

I doubt if there is any truely "common" term for a type that's longer than a
long.


T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan

jfh@rpp386.cactus.org (John F. Haugh II) (02/11/90)

In article <4849@amelia.nas.nasa.gov> truesdel@sun217..nas.nasa.gov (David A. Truesdell) writes:
>A striped (or stripeing) filesystem is one in which the filesystem is spread
>out over a set of disks in order to increase capacity and/or performance and/or
>reliability.  The filesystem I'm testing would be classed as "level 5 RAID".
>(That's "Redundant Array of Inexpensive Disks", too bad our disks can't really
>be called "Inexpensive".)

I think you've described three different types of file system schemes.

Striping, from what I've seen, refers to laying consecutive cylinders out
on consecutive drives so that a seek on one drive can occur at the same
time as the transfer on the next drive, thus, seeks are free for sequential
reads.

Another strategy is mirroring, which puts redundant copies of the data
on one or more drives [ usually more than one ] to increase the realiability
of the data.  A drive system with two 50,000Hr MTBF drives mirroring each
other would have a MTBF of decades or centuries instead of years.  A failed
drive could be powered down and replaced without the need to re-boot the
entire system, provided the hardware permitted drive replacement with the
power on.

The simplest reason to use more than one drive is to create a filesystem
larger than any of the single drives involved.  I've seen this refered to as
"spanning".  The beginning of one drive is the logical end of the previous
drive.  Thus, two 250MB drives could be combined to make a single 500MB
logical drive, and so one.

Device drivers for all of these schemes are fairly trivial once the
underlying physical device driver is written.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org

ejp@bohra.cpg.oz (Esmond Pitt) (02/12/90)

In article <11372@attctc.Dallas.TX.US> markh@attctc.Dallas.TX.US (Mark Harrison) writes:
> 
>As Unix tries to get a larger share of the commercial market, We will see
>a need for storing numeric values with 18-digit precision, ala COBOL and
>the IBM mainframe.  This can be accomplished in 64 bits, and is probably
>the reason "they" chose 18 digits as their maximum precision.

According to a fellow who had been on the original IBM project in the
fifties, the 18 digits came about because of using BCD (4-bit decimal)
representation, in two 36-bit words.


-- 
Esmond Pitt, Computer Power Group
ejp@bohra.cpg.oz

rcd@ico.isc.com (Dick Dunn) (02/13/90)

peralta@pinocchio.Encore.COM (Rick Peralta) writes:
> What are the feelings here regarding 64 bit longs?
. . .
> 	. larger disk storage
> 	  (no joke single volumes will be breaking lseek() soon)

Files are already breaking a 32-bit lseek pointer.  But shouldn't that one
be tackled differently?  The second argument to lseek should be an off_t,
not a long (and certainly not an int, as some have tried to inflict on us).

Perhaps the appearance of real uses for 64-bit integers and/or pointers
should cause us to think a little harder about the problems we've created
in the past.  The 16->32 transition was painful enough.

(Note that the "64-bit" discussion is also going on in comp.arch.)
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...Mr. Natural says, "Use the right tool for the job."

truesdel@sun217..nas.nasa.gov (David A. Truesdell) (02/13/90)

jfh@rpp386.cactus.org (John F. Haugh II) writes:
>In article <4849@amelia.nas.nasa.gov> truesdel@sun217..nas.nasa.gov (David A. Truesdell) writes:
>>A striped (or stripeing) filesystem is one in which the filesystem is spread
>>out over a set of disks in order to increase capacity and/or performance and/or
>>reliability.

>I think you've described three different types of file system schemes.

No, there are a lot of different filesystem schemes which can display these
same attributes (capacity, performance, reliability) to differing degrees.

>Striping, from what I've seen, refers to laying consecutive cylinders out
>on consecutive drives so that a seek on one drive can occur at the same
>time as the transfer on the next drive, thus, seeks are free for sequential
>reads.

Another variation can place consecutive blocks on drives with different data
paths which can increase the I/O transfer rate above that of an individual
drive (or data path).  Seeks would be concurrent, too.

>Another strategy is mirroring, which puts redundant copies of the data
>on one or more drives [ usually more than one ] to increase the realiability
>of the data.  A drive system with two 50,000Hr MTBF drives mirroring each
>other would have a MTBF of decades or centuries instead of years.  A failed
>drive could be powered down and replaced without the need to re-boot the
>entire system, provided the hardware permitted drive replacement with the
>power on.

A "shadowed", or "mirrored", filesystem is very reliable, however, for a large
site this can become quite expensive.  Imagine having to buy twice (or more)
the amount of disk in order to hold all your data.  Other variations of RAID
filesystems (a mirror disk is classed as a "Level 1" RAID) can employ error
correction techniques to obtain more than adequate reliability, without wasting
50% of your disk capacity.  In addition, a mirrored filesystem won't help your
I/O throughput.

The equation below shows how to calculate the effective MTBF for a multi-disk
filesystem.  The variables are: the MTBF of a disk (MTBFdisk), the mean time to
repair for a disk (MTTRdisk), the number of data disks (#data) and the number
of disks with redundant data (#ecc).

                     ( MTBFdisk ) ^ 2
    MTBFfs = --------------------------------
              #data(#data + #ecc) * MTTRdisk

>The simplest reason to use more than one drive is to create a filesystem
>larger than any of the single drives involved.  I've seen this refered to as
>"spanning".  The beginning of one drive is the logical end of the previous
>drive.  Thus, two 250MB drives could be combined to make a single 500MB
>logical drive, and so one.

However, this simple approach is not without its own risks.  If redundant
information is not kept, the equation above degenerates into:

              MTBFdisk
    MTBFfs = ----------
               #data

So if you use your 50,000 hour MTBF disks, your filesystem ends up with a MTBF
of 25,000 hours.  And the more disks you add, the worse it gets.

Try working out the numbers for yourself.  Consider a filesystem which you want
to span 11 disks.  A striped filesystem, with a single ecc disk, would require
a total of 12 drives.  Using 50000 hours as the MTBF, and 10 hours for the time
to repair, you get a mean time between failure for the filesystem of 1,893,939
hours (or 216 years).  A mirrored filesystem (spanning the disks) of the same
capacity would require a total of 22 drives, and would have a MTBF of 1,033,057
hours (or 117 years).  For the worst case, a simple "spanned" filesystem would
require only 11 disks, but would have a MTBF of 4,545 hours, or 189 DAYS.

T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

"Testing can show the presense of bugs, but not their absence." -- Dijkstra
"Each new user of a new system uncovers a new class of bugs." -- Kernighan

kak@hico2.UUCP (Kris A. Kugel) (02/19/90)

In article <11372@attctc.Dallas.TX.US>, markh@attctc.Dallas.TX.US (Mark Harrison) writes:
> writes:
> >What are the feelings here regarding 64 bit longs?
>  
> As Unix tries to get a larger share of the commercial market, We will see
> a need for storing numeric values with 18-digit precision, ala COBOL and
> the IBM mainframe.  
> 
> btw, I have always heard 64 bit integers referred to as "xlongs" (extra
> longs)... is this common or just our own local jargon?
> 
> Mark Harrison
> (markh @ attctc)

We are starting to have problems because of the wide variety of
wordsizes on the machines UNIX runs on.  Does it make sense that
a long is such a different size on different machines?  What if
you want a guarenteed precision?  I'm beginning to think that
some kind of declaration construct  like int(need32) var; is needed.
The layout of structures is another problem; my friends at NETWISE
seem to think that they have a solution, but it seems to me to
make more sense to be able to specify exact layout, good over
ALL machines, than to translate every message sent over a
hetrogenous network.  But this means language support.  Isn't it
about time we bit the bullet and decided that the C language needs
to support types, structures, and ints that look the same from one
machine to another?  We are only going to network more in the future,
not less.

				Kris A. Kugel
	{uunet,att,rutgers}!westmark!hico2!kak <--daily
	                        ssbn!hico2!kak <--semi-daily

gwyn@smoke.BRL.MIL (Doug Gwyn) (02/21/90)

In article <194@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
>We are starting to have problems because of the wide variety of
>wordsizes on the machines UNIX runs on.  Does it make sense that
>a long is such a different size on different machines?  What if
>you want a guarenteed precision?  I'm beginning to think that
>some kind of declaration construct  like int(need32) var; is needed.

This is not a UNIX issue, it's a programming language issue.

For C, the answer is, yes it DOES make sense to allow the implementation
to take into account the characteristics of the system it runs on.
There are ways in C to program portably; use them.

>We are only going to network more in the future, not less.

We're well aware of the issues you raised.  They are not properly
solved by tacking inadequate kludges onto programming languages.

johnl@gronk.UUCP (John Limpert) (02/21/90)

In article <194@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
>We are starting to have problems because of the wide variety of
>wordsizes on the machines UNIX runs on.  Does it make sense that
>a long is such a different size on different machines?  What if
>you want a guarenteed precision?  I'm beginning to think that
>some kind of declaration construct  like int(need32) var; is needed.

Sounds like you want declarations like those in PL/I or ADA.
I think it would be a real bucket of worms for compiler
developers.  C is primarily a systems programming language,
it makes no attempt to hide the hardware from the programmer.
The virtual machine philosophy used by some programming languages
just isn't appropriate for C.  Typedefs and defines can be used
to match native machine types to the needs of the program.

>The layout of structures is another problem; my friends at NETWISE
>seem to think that they have a solution, but it seems to me to
>make more sense to be able to specify exact layout, good over
>ALL machines, than to translate every message sent over a
>hetrogenous network.  But this means language support.  Isn't it
>about time we bit the bullet and decided that the C language needs
>to support types, structures, and ints that look the same from one
>machine to another?  We are only going to network more in the future,
>not less.

This would cause big problems for machines that differ significantly
from the architecture of the proposed 'C virtual machine'.  I expect
C to give me efficient use of the hardware.  If I wanted portability
at any cost then I would use ADA.

Suggestions of this sort seem to come up with regularity.  Don't try
to change C into some nice, safe, portable programming language
with all sharp edges removed, pick another language.

-- 
John Limpert		johnl@gronk.UUCP	uunet!n3dmc!gronk!johnl

peralta@pinocchio.Encore.COM (Rick Peralta) (02/22/90)

In article <194@hico2.UUCP> kak@hico2.UUCP (Kris A. Kugel) writes:
>> >What are the feelings here regarding 64 bit longs?
>
>We are starting to have problems because of the wide variety of
>wordsizes on the machines UNIX runs on.  Does it make sense that
>a long is such a different size on different machines?  What if
>you want a guarenteed precision?  I'm beginning to think that
>some kind of declaration construct  like int(need32) var; is needed.

>The layout of structures is another problem;
>Isn't it about time we bit the bullet and decided that the C language
>needs to support types, structures, and ints that look the same from
>one machine to another?

Standardizing makes infinite sense, but is a logistical monster.

Maybe a switch that can regress to the "old way" (whatever that is)
and defaults to a new standard can be managed.

As for required sizes there is a mechanism: int x:32;

Byte ordering is a real issue.  Casting or declaring a type to have a
particular byte order seems wonderful, 'till you look at what it does
to the compiler folks.  They will have to convert every data item's byte
order for each operation.  (How about: 1234 int x; (1234) x++; (4321) x--;)

Since we're getting the compiler people excited, why not have some fun...
Why can't we have the compiler manage math sizes other than are supported
in the current hardware.  For example: a 16 bit machine with 32 or 64 bit
math.  If the hardware is inadequate, just call a library or inline the
code.  That way math code would no longer be functionally limited by the
hardware.

 - Rick "But it should be put on the standards list..."

guy@auspex.auspex.com (Guy Harris) (02/23/90)

>Isn't it about time we bit the bullet and decided that the C language needs
>to support types, structures, and ints that look the same from one
>machine to another?  We are only going to network more in the future,
>not less.

I hate to have to break the sad news to you, but size and structure
layout aren't the only issues here.  Byte order is another issue, and
floating-point format is still another.  No matter *how* much we network
in the future, it's not at all clear that the problem can be properly
"fixed" by changing the language so that you can tell compiler to do
things with data structures and members thereof....

eloranta@tukki.jyu.fi (Jussi Eloranta) (02/25/90)

I just compiled gnulib2 for gcc (the 64-bit library). How do I access it?
(ie. how do I tell when I want to have 32 or 64 bit integers?)

I didn't find anything from the docs..

Thanks,

jussi
-- 
============================================================================
Jussi Eloranta               Internet(/Bitnet):
University of Jyvaskyla,     eloranta@tukki.jyu.fi
Finland                      [128.214.7.5]

meissner@osf.org (Michael Meissner) (02/27/90)

In article <3538@tukki.jyu.fi> eloranta@tukki.jyu.fi (Jussi Eloranta) writes:

| I just compiled gnulib2 for gcc (the 64-bit library). How do I access it?
| (ie. how do I tell when I want to have 32 or 64 bit integers?)
| 
| I didn't find anything from the docs..

Yeah, it isn't in the docs.  To use a 64-bit type, just use:

	long long

which seems to be a convention between several compilers.  Make sure
gnulib is on the link line (the simplest way is to use gcc to link the
programs).
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so