[comp.unix.wizards] rdump, Ethernet slowness

stevens@hsi.UUCP (Richard Stevens) (12/07/87)

When we added a second VAX we had planned to use the existing 6250-bpi
tape drive for dumping its filesystems, using rdump(8).  Using rdump
once made it clear that this wasn't a viable solution.  (We're running
vanilla 4.3 BSD on a 785 and an 8600.)  We didn't delve into the
problem much, but punted and bought a cheap Unibus controller for the
newer VAX, and switch the existing 6250 drive between the two VAX'es
to do the level 0 dumps.

The comments in the paper by Karels & McKusick at the 1986 Atlanta Usenix
("Network Performance and Management with 4.3 BSD and IP/TCP") led me to
expect better results with rdump.

I then decided to get a better handle on just how slow dump and rdump are,
and here are the results, done on a single-user 8600 with a Kennedy
9400 tape drive (45 ips at 6250) with an Emulex TC13 Unibus controller.
The network hardware is an Interlan N1010A on both Vax'es.

	552,000 bytes/sec - speed of a C program that writes 1000 32768-byte
			    buffers to another process on the same system.
			    The two processes are connected with a stream
			    socket created by rexec(3), with the SO_SNDBUF
			    socket option set to 32768.

	540,000 bytes/sec - raw disk speed for an RA81, using dd and bs=32768.
			    Driver's UDABURST is set to 4.  I'm not sure
			    what the UDA50's Unibus delay jumper is set to.

	266,000 bytes/sec - "theoretical" max speed of a 45 ips, 6250 tape
			    drive, writing 1000 32768-byte blocks, with
			    0.3-inch inter-record gaps.

	247,000 bytes/sec - actual speed of a simple C program that does
			    1000 write(2) calls of a 32768-byte block,
			    to the raw tape drive.  About 7% less than the
			    theoretical max above, which isn't bad.	

	172,000 bytes/sec - dump speed of the RA81 to /dev/null.

	166,000 bytes/sec - dump speed of the RA81 to the tape drive.
			    About 3% slower than to /dev/null, which isn't bad.
			    About 32% slower than the tape drive speed.

	 52,000 bytes/sec - speed of a C program writing 1000 32768-byte
			    buffers to another process on the other system,
			    across the Ethernet, using a stream socket, as
			    in the 552,000 bytes/sec example given above.

	 41,000 bytes/sec - rdump speed to /dev/null on the other system.
			    4 times slower than dump.  Not very good.

For the actual tests of dump and rdump, I timed the second tape that dump
wrote, to avoid the first 3 passes that dump makes before it starts
dumping the regular files.  Also, for the rdump test, the other system
(for the /dev/null output) was essentially idle, with the priority of
the /etc/rmt process set to -20.

Should the actual throughput of the Ethernet be as slow as we're seeing ??
Should I expect rdump to be so slow, given the Ethernet throughput ??
(Maybe rdump wouldn't be so bad if you were used to a TS11 :-) ).

	Richard Stevens
	Health Systems International, New Haven, CT
           { uunet | ihnp4 } ! hsi ! stevens

mdb@laidbak.UUCP (12/07/87)

I believe that the /etc/rmt protocol, used by rdump, is synchronous. Each
I/O operation is acknowledged (back through the Ethernet) before the next
one is begun. The network transfer rate is good, but the latency is poor.
Hence, rdump spends much of the time waiting for write acknowledgements.

					Mark Brukhartz
					Lachman Associates, Inc.
					..!{ihnp4, sun}!laidbak!mdb

matt@oddjob.UUCP (12/07/87)

I use rdump to back up a tapeless many-user sun-3/280 to an elxsi.
Even though Elxsi's rmt limits me to 10k tape blocks, The performance
is not too bad.  Restoring a partition with rrestore, however, is
excruciating!  It takes about 4 hours to restore a level 0 dump of a
175MB partition.

Has anyone ever profiled restore?
________________________________________________________
Matt	     University		matt@oddjob.uchicago.edu
Crawford     of Chicago     {astrovax,ihnp4}!oddjob!matt

jrl@anuck.UUCP (j.r.lupien) (12/09/87)

> The network transfer rate is good, but the latency is poor.

How good is this? I know that the network bandwidth is 10Mb,
but when I transfer a 1 MB file, instead of the 10 seconds that
the true bandwidth suggests I might get on an unloaded net, 
I find that it takes a minute or more. Is this software delay,
or is the ethernet controller chip just not turning around
fast enough? 

This is only idle curiosity, but if it comes out that the controller
is too slow, maybe someone will make one that can really feed the
net, which would change my mind about being somewhat put off of
ethernet because it fails to deliver on the single-user throughput.

John R. Lupien
ihnp4!mvuxa!anuxh!jrl

matt@oddjob.UChicago.EDU (Yes, *THAT* Matt Crawford) (12/10/87)

j.r.lupien (whom I can't reach by mail) writes:
) when I transfer a 1 MB file, ... I find that it takes a minute or
) more. ... if it comes out that the controller is too slow, maybe
) someone will make one that can really feed the net, which would
) change my mind about being somewhat put off of ethernet because it
) fails to deliver on the single-user throughput. 

Someone has.  I just transferred a 1MB (2^20 byte) file from a
sun-3/280 with an Intel 82586 ethernet chip to a sun-3/60 with an AMD
Am7990 and it took 7.11 seconds.  (This is with ftp and the standard-
issue TCP on each end.)  The network has many diskless clients on it.
If the bytes were not going to and from disk I'm sure the transfer
would be faster still.
					Matt Crawford

mangler@cit-vax.Caltech.Edu (Don Speck) (12/10/87)

In article <1268@laidbak.UUCP>, mdb@laidbak.UUCP (Mark Brukhartz) writes:
> I believe that the /etc/rmt protocol, used by rdump, is synchronous.	Each
> I/O operation is acknowledged (back through the Ethernet) before the next
> one is begun.

That wasn't the bottleneck.  The problem lies in this section:

In article <788@hsi.UUCP>, stevens@hsi.UUCP (Richard Stevens) writes:
>	 52,000 bytes/sec - speed of a C program writing 1000 32768-byte
>			    buffers to another process on the other system,
>			    across the Ethernet, using a stream socket, as
>			    in the 552,000 bytes/sec example given above.

52Kbytes/sec of tcp throughput is ATROCIOUS for a VAX/785 with a good
Ethernet board like an Interlan NI1010A.  My 750's, with the same kind
of Ethernet board, do 88 Kbytes/sec with reads of that size.  I cannot
account for why his throughput would be this poor.  Any ideas?

Don Speck   speck@vlsi.caltech.edu  {amdahl,scgvaxd}!cit-vax!speck

mangler@cit-vax.Caltech.Edu (Don Speck) (12/10/87)

In article <14115@oddjob.UChicago.EDU>, matt@oddjob.UChicago.EDU (Ke Kupua) writes:
> excruciating!  It takes about 4 hours to [r]restore a level 0 dump of a
> 175MB partition.

That's 12 Kbytes/sec, a magic number.  It's the rate you get if
tcp_recvspace is too much bigger than tcp_sendspace.

To save on acknowledgements, an ack is not sent until 35% of
tcp_recvspace bytes have been received.  The 4.3bsd /etc/rrestore
does an ioctl to raise the receive buffer for its socket (to the
same as the block size - 10K).	The sender's buffer is probably
smaller than 35% of 10K, and hence can't send that much data
without getting an acknowledgement; and things just sit there
until the receiver's TCP times out and decides to send an ack.

Try removing the SOL_SOCKET stuff from rrestore.  It was a nice
idea, but doesn't interoperate very well.

You have the same problem in rdumping from a 4.2bsd machine to
a 4.3bsd /etc/rmt.

Don Speck   speck@vlsi.caltech.edu  {amdahl,scgvaxd}!cit-vax!speck

ggs@ulysses.homer.nj.att.com (Griff Smith) (12/11/87)

In article <411@anuck.UUCP>, jrl@anuck.UUCP writes:
> > The network transfer rate is good, but the latency is poor.
> 
> How good is this? I know that the network bandwidth is 10Mb,
> but when I transfer a 1 MB file, instead of the 10 seconds that
> the true bandwidth suggests I might get on an unloaded net, 
> I find that it takes a minute or more.

10 Mbit/sec == 1.2 Mbyte/sec.  True, I never see speeds anywhere near
that.  I do see 110 Kbyte/sec when I use a CCI POWER 6/32 running
4.3BSD to dump to a DEC VAX 8650, also running 4.3BSD.  I think this
comes to 9 seconds/Mbyte.  I just tried a file copy between those two
processor and got 10 seconds/Mbyte, which is just what you wanted.

> Is this software delay,

Probably; protocol delay, and sometimes processor overload.  TCP/IP
can chew up a lot of a processor.  Our CCI machine has speeded up
by a factor of 6 since we first started using 4.3BSD on our VAXen.
The first factor of 2 was a windowing strategy mismatch between
4.3BSD and the version of 4.2BSD that was running on the CCI processor.
A further factor of 3 resulted when we upgraded to a 4.3BSD+ beta
distribution from Berkeley.

> maybe someone will make [a controller] that can really feed the
> net, which would change my mind about being somewhat put off of
> ethernet because it fails to deliver on the single-user throughput.
> 
> John R. Lupien
> ihnp4!mvuxa!anuxh!jrl

You don't say what flavor of UNIX you are using.  Protocol implementations
can have a lot of effect on the speed.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

aglew@ccvaxa.UUCP (12/11/87)

..> Speed of dump vs. rdump

You're all talking around the problem of synchronous I/O,
if your network has sufficient thruput but excessive
latency to use synchronously.

In 4.3 dump has been changed to use multiple processes
and acheive a facsimile of asynchronous I/O, but I doubt
that rdump has. What we really need are asynchronous
I/O facilities in UNIX, so that the person who wants to
do things at the maximum thruput rate can do so without
having to mess with multiple processes. HP's contention that
asynch I/O isn't needed because it can be acheived with
multiple processes and shared memory doesn't hold water
(especially if you're me, and do a lot of I/O intensive
work, and are always pressing against your process limits).

Asynchronous I/O primitives should basically look like this:

IOtransT aread(fd,buf,n)	- initiate a read
IOtransT awrite(fd,buf,n)	- initiate a write
iowait(n,iolist)		- wait for ios to complete

With operations to guarantee sequentiality, or sometimes not.


Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
aglew@mycroft.gould.com    ihnp4!uiucdcs!ccvaxa!aglew    aglew@gswd-vms.arpa
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.

chris@mimsy.UUCP (Chris Torek) (12/15/87)

In article <57900004@ccvaxa> aglew@ccvaxa.UUCP writes:
>Asynchronous I/O primitives should basically look like this:
>
>IOtransT aread(fd,buf,n)	- initiate a read
>IOtransT awrite(fd,buf,n)	- initiate a write
>iowait(n,iolist)		- wait for ios to complete
>
>With operations to guarantee sequentiality, or sometimes not.

I have a pseudo-driver that provides asynchronous read and write
operations, but only to character special devices (and only some
of them).  It works about like this, except that aread(fd, buf, n)
is done with read(assocfd, buf, n) and awrite(fd, buf, n) is done
with write(assocfd, buf, n).  iowait() is implemented by select(assocfd),
and sequentiality of transfers can be set or unset by an ioctl on
assocfd.

It is unfortunate that this only works on some character special
devices, and that it requires somewhat major changes to 4.3BSD.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris