[comp.sys.sgi] SGI 2GByte TCP limit

mike@BRL.MIL (Mike Muuss) (03/05/91)

Over the past few weeks, we have encountered difficulty using RDUMP
on an SGI 280 server.  We are attempting to dump a 4 Gbyte (4-way
stripe) filesystem across the network to a Gould running BSD UNIX.
(The motivation for this is, alas, that our SGI-provided tape
drives have been having a rough time of it lately.)

We have been most disturbed by the fact that the RDUMP aborts on the
15th reel (!), repeatably.  150 Mbytes/reel * 15 reels = 2.1 Gbytes.
2 Gbytes = 2**31.  Suspicious!

Chuck Kennedy ran some tests for me using (the BRL-original version) of
TTCP, and encountered the same difficulty.  Running from SGI to SGI
(IRIX Release 3.3.1), the sys-write() call returns an error 27
after about 1 hour and 47 minutes of data transfer.  (Alas, we don't
know the exact byte count at this point).
	
	#define        EFBIG   27      /* File too large */

Amusingly, both the sender and receiver got this error 

While this was the first time that I can recall having intentionally
transferred 2 Gbytes of data on a TCP connection, it seems like an
unfortunate limitation.

Just as a sanity check, Chuck also ran the same test between two
Gould PN 9080 systems (running UTX 2.0, a 4.3 BSD system).
The test was to send 3000 buffers of 1048576 bytes each, or about
3 Gbytes.  The Goulds successfully transmitted the entire sequence,
without error.

So, my questions to SGI are:

*)  Is this limit intentional?
	*)  If yes, why?
	*)  If no, when will it be fixed?

*)  Is there an easy work-around, like a periodic lseek(fd,0L,0) ?

	Best,
	 -Mike

PS:  SGI folks, please don't get paranoid about my finding all these
little problems with SGI machines.  It's a function of my getting a lot
of work done on them.

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (03/06/91)

In article <9103042301.aa02195@WOLF.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes:
>                                      ....   Running from SGI to SGI
> (IRIX Release 3.3.1), the sys-write() call returns an error 27
> after about 1 hour and 47 minutes of data transfer.  (Alas, we don't
> know the exact byte count at this point).
> 	#define        EFBIG   27      /* File too large */
> 
>  ...
> So, my questions to SGI are:
> 
> *)  Is this limit intentional?
> 	*)  If yes, why?

Be nice.
The 2/4GB SystemV limit has been a matter of internal controversy for a
years.  No one who has proposed a solution has argued with any conviction.

> 	*)  If no, when will it be fixed?

RealSoonNow.  If we're lucky in figuring out a clean fix, and in navigating
the shoals of release cycles, in the NextMajorRelease.

> *)  Is there an easy work-around, like a periodic lseek(fd,0L,0) ?

The offending, largely standard SVR3 code is in rdwr() is:
        if (fp->f_offset < 0) {
                u.u_error = EFBIG;
and later
        if ((type != IFCHR) || (cdevsw[major(ip->i_rdev)].d_str == NULL))
                fp->f_offset += uap->count - u.u_count;

My obviously false recollection is that we faked sockets as IFCHR devices
under the FSS years ago when we stuck 4.3BSD TCP into 3.4.  Oh, well.

With `dbx -k /unix /dev/kmem` you could patch the the first test in a
running system.  (Look near line 110 in file sys2.c.  Dbx knows line
numbers even if you do not have source.)  It's too bad we don't have an
equivalent of `adb -w...`.  If you can change RDUMP to use send() instead
of write(), it seems likely that the problem would go away there.  I doubt
an lseek() would help because of the following in seek():
	if ((ip->i_ftype == IFIFO) || (ip->i_ftype == IFSOCK)) {
		u.u_error = ESPIPE;


Vernon Schryver,   vjs@sgi.com