mike@BRL.MIL (Mike Muuss) (03/05/91)
Over the past few weeks, we have encountered difficulty using RDUMP on an SGI 280 server. We are attempting to dump a 4 Gbyte (4-way stripe) filesystem across the network to a Gould running BSD UNIX. (The motivation for this is, alas, that our SGI-provided tape drives have been having a rough time of it lately.) We have been most disturbed by the fact that the RDUMP aborts on the 15th reel (!), repeatably. 150 Mbytes/reel * 15 reels = 2.1 Gbytes. 2 Gbytes = 2**31. Suspicious! Chuck Kennedy ran some tests for me using (the BRL-original version) of TTCP, and encountered the same difficulty. Running from SGI to SGI (IRIX Release 3.3.1), the sys-write() call returns an error 27 after about 1 hour and 47 minutes of data transfer. (Alas, we don't know the exact byte count at this point). #define EFBIG 27 /* File too large */ Amusingly, both the sender and receiver got this error While this was the first time that I can recall having intentionally transferred 2 Gbytes of data on a TCP connection, it seems like an unfortunate limitation. Just as a sanity check, Chuck also ran the same test between two Gould PN 9080 systems (running UTX 2.0, a 4.3 BSD system). The test was to send 3000 buffers of 1048576 bytes each, or about 3 Gbytes. The Goulds successfully transmitted the entire sequence, without error. So, my questions to SGI are: *) Is this limit intentional? *) If yes, why? *) If no, when will it be fixed? *) Is there an easy work-around, like a periodic lseek(fd,0L,0) ? Best, -Mike PS: SGI folks, please don't get paranoid about my finding all these little problems with SGI machines. It's a function of my getting a lot of work done on them.
vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (03/06/91)
In article <9103042301.aa02195@WOLF.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes: > .... Running from SGI to SGI > (IRIX Release 3.3.1), the sys-write() call returns an error 27 > after about 1 hour and 47 minutes of data transfer. (Alas, we don't > know the exact byte count at this point). > #define EFBIG 27 /* File too large */ > > ... > So, my questions to SGI are: > > *) Is this limit intentional? > *) If yes, why? Be nice. The 2/4GB SystemV limit has been a matter of internal controversy for a years. No one who has proposed a solution has argued with any conviction. > *) If no, when will it be fixed? RealSoonNow. If we're lucky in figuring out a clean fix, and in navigating the shoals of release cycles, in the NextMajorRelease. > *) Is there an easy work-around, like a periodic lseek(fd,0L,0) ? The offending, largely standard SVR3 code is in rdwr() is: if (fp->f_offset < 0) { u.u_error = EFBIG; and later if ((type != IFCHR) || (cdevsw[major(ip->i_rdev)].d_str == NULL)) fp->f_offset += uap->count - u.u_count; My obviously false recollection is that we faked sockets as IFCHR devices under the FSS years ago when we stuck 4.3BSD TCP into 3.4. Oh, well. With `dbx -k /unix /dev/kmem` you could patch the the first test in a running system. (Look near line 110 in file sys2.c. Dbx knows line numbers even if you do not have source.) It's too bad we don't have an equivalent of `adb -w...`. If you can change RDUMP to use send() instead of write(), it seems likely that the problem would go away there. I doubt an lseek() would help because of the following in seek(): if ((ip->i_ftype == IFIFO) || (ip->i_ftype == IFSOCK)) { u.u_error = ESPIPE; Vernon Schryver, vjs@sgi.com