[net.unix-wizards] simultaneous file transfer on ethernet

acrotty@cvbnet.uucp (Art Crotty) (08/21/86)

lineater +3



I have a somewhat complex request for info involving file transfer
between hundreds of SUN3 and SUN2 workstations.  These workstations'
are networked together using ethernet (802.3). Protocols - TCP/IP/UDP.

I would like to have the ability to transfer large application
programs to all nodes on the network simultaneously.

Why I think it may be possible to do this:


1.) The ethernet packet information can contain what is commonly
    called a multicast bit within the destination address.  Thus,
    I should be able to set this bit to broadcast or spray my large
    application program (ie. 10mb-30mb) to all nodes on the network.

    I also, using the multi-cast bit, should be able to set up a table
    of nodes that I wish to distribute the program to.  Thus, if
    the bit is set to 0 - it is specific address, 1 it is a group of
    nodes and all 1's in the field indicate all nodes.

2.) Some user-level programs already do something similar to what
    I want.   For instance, "wall" will broadcast a message to nodes
    on your network.  The command "rcp" will copy one or a group of
    files to one particular destination at a time.  I want a "wall"
    or "rwall" and combined "rcp" that can copy my file or files
    simultaneously to all nodes or subgroups of nodes on a network.

I know NFS allows mounting of an application to nodes and simultaneous
access of that application - but that is not what I want.  I want to
distribute to stand-alone machines as well as file servers new copies
of an application once a week and each rcp or "dread the thought"
cartridge taping can consume 1/2 hour per node.  Thus, 20 nodes
is 20 x 1/2 hour by rcp from a master database or less time if multiple
tapes are made or rsh tarring from a server with 1/2" tape.

I would like to be able to say something like:

distribute -g <tablefile> <application> 

where -g is the option for group and tablefile is the database that
contains a list of nodes with names or internet addresses

distribute -a <application>

where -a is for all nodes - no table

I am not that familiar with the networking code on SUN's and was
wondering the following:

Can it be done?

Is this beyond the ability of the Ethernet itself?

When "wall" does a broadcast - is it simultaneous to all nodes or
consecutive?

Will I need some sort of daemon process running, all the time, on
each node waiting for a signal to allow broadcast file transmissions,
or can /etc/inetd already handle this type of request with little or no
code tweeking?

What kind of error checking do I have to do for testing that
the program was successfully transmitted without losing packets
or corrupting packets - at source or destination?

Has anyone created a program that can do this?

If not, can someone get me started as to the process or code that
I might need to access or create to accomplish what I want?

For instance:

you can alter "inetd" or you have to create a new daemon

you must access these libraries and change this/that

you must use these calls

etc., etc.



Thanks in advance for all advice!!!

+-------------------------------------------------------------------+
|                                                                   |
|        /\               Post: Art Crotty                          |
|       /  \                    Computervision Corp.                |
|      /_  _\                   14 Crosby Drive                     |
|     / o  o \                  Bldg. 5-1                           |
|  -mm--------mm-               Bedford, Mass. 01730                |
|                      Ma Bell: (617) 275-1800                      |
| The fool wanders,       UUCP: { decvax,raybed2 }!cvbnet!acrotty   |
| the wise man travels.                                             |
+-------------------------------------------------------------------+

hedrick@topaz.RUTGERS.EDU (Charles Hedrick) (08/25/86)

The original article asks whether it is feasible to update software on
multiple systems by using a broadcast protocol.  This would save you
from having to do separate copies to each.  Anything is possible to do
with enough design work, but let me mention two serious problems.

First, the Ethernet is not a reliable medium.  This means that any
individual packet may be dropped.  All protocols currently used to
send files include some sort of acknowledgement that the packet really
got there.  If an ack is not received, the sender resends the packet.
This is true of FTP, rcp, and NFS, though the actual details of the
protocols are different for NFS and the other two.  So a broadcast
distribution protocol would have to keep a list of the sites that are
expected to be receiving, and keep resending each packet until it has
gotten an ack from every receiver.  Since the acks would all be sent
at the same time, you would have guaranteed collisions on the
Ethernet.  Probably you would want some sort of randomized delay
before sending the ack.  This would be a nontrivial design problem,
and probably there would be other implications that I have not
noticed.  But an experienced protocol designer could probably solve
the problem.

You imply that you are going to be updating hundreds of Suns.  I would
be somewhat wary of the idea of hundreds of Suns on a single Ethernet.
When we asked Sun about this, they recommended no more than 50 
diskless Suns on a single Ethernet.  Our measurements suggest that
this number is about right.  Of course if the machines are not
diskless, more should be possible.  But there is a limit.  If you have
hundreds of machines, they are probably going to be on more than one
Ethernet, with gateways.  Broadcasts do not go through gateways,
unless special provisions are made.  This is a good thing.  It
protects networks from other networks where a machine has decided to
start spraying the network with high-speed broadcasts (a failure mode
that is not uncommon when you are playing with experimental network
software).  There are also problems in making sure that loops don't 
occur.  If a gateway forwards broadcasts from one interface to the
other, any very interesting topology will end up with broadcasts
looping around the network.  These problems can be solved, and indeed
there is an RFC describing multi-network broadcasts, but you should
realize that there are design issues involved with broadcast protocols
that involve more than one Ethernet.

My suspicion is that this is not worth doing.  I suggest instead using
a branching tree distribution.  I.e. your master sends to 10 machines
and each of them to 10 more, or something like that.  Note that the
Ethernet should be able to support a number of simultaneous transfers,
as long as they are not broadcasts.  The limit on network bandwidth
for most machines (including Suns) is the machine's own Ethernet
hardware and software.  The fastest real transfers I have seen are
1MBit/sec, and even that requires special care.  200Kbit/sec is more
normal.  Thus the Ethernet should be able to support a reasonable
number of simultaneous copies, as implied by the branching tree
model.  Collisions would not be the problem here that it would be
with the broadcast scenario, since the various copies would quickly
lose any synchronization that they might have.

jqj@gvax.cs.cornell.edu (J Q Johnson) (08/25/86)

The author of the original article proposes use of broadcast/multicast (on
networks that support it) as a way of achieving multiple parallel file
transfers.  The problem with this scheme, obviously, is that there is no
simple way to achieve reliability (though see various papers on "reliable
broadcast" by Ozalp Babaoglu et al).  Most file transfer protocols use
end to end ack/nack to make sure the data got there, which assumes the
sender knows who is receiving the data.

A multicast-based ftp is not impossible, but it certainly doesn't match
the communications model of any of the popular existing protocol families
(tcp/ip, sna, decnet, osi, xns, etc.).  TCP/IP didn't even standardize
the value of the broadcast \fIaddress\fP until recently!

Note that most existing broadcast applications on Ethernets assume
unreliable broadcast, and are generally used for sending status information
(or requests for information).  In almost all cases, the amount of 
information to be transferred is limited to a single packet.

Conclusion:  it's a good topic for research, but don't expect anyone to
implement such a beast in the near future.  And don't ever expect to
see it layered on TCP/IP.

guy@sun.uucp (Guy Harris) (08/25/86)

>     I also, using the multi-cast bit, should be able to set up a table
>     of nodes that I wish to distribute the program to.

Except that the Sun driver for the "ie" interface doesn't understand
multicasts.  You'd have to change that driver, provide "ioctl" calls to set
the multicast address group, and provide a way, in whatever protocol you
used, to specify that a packet is to go to a multicast group.

> 2.) Some user-level programs already do something similar to what
>     I want.   For instance, "wall" will broadcast a message to nodes
>     on your network.

No, it won't.  The "rwall" command will send messages to other machines;
however, it does not "broadcast" them, in the sense that it uses Ethernet
broadcast facilities for this.  when discussing networks.  If it is asked to
send messages to a set of machines, it does so by running through an
enumeration of those machines and sending to them one at a time.

> I know NFS allows mounting of an application to nodes and simultaneous
> access of that application - but that is not what I want.  I want to
> distribute to stand-alone machines as well as file servers new copies
> of an application once a week and each rcp or "dread the thought"
> cartridge taping can consume 1/2 hour per node.

You may, in this case, want to have the stand-alone machines get the
application via NFS.

> I would like to be able to say something like:
> 
> distribute -g <tablefile> <application> 
> 
> where -g is the option for group and tablefile is the database that
> contains a list of nodes with names or internet addresses

Even if IP supported multicast groups, this would not be straightforward.
You can't assign a host to a multicast group; that host has to add
*itself* to the multicast group.  As such, you'd have to start by telling
the hosts in that list to join a particular multicast group (you'd also have
to either 1) reserve a multicast group for this or 2) find some way of
finding an unused group and choosing it).

I think there may be some RFCs discussing the use of multicast addresses in
IP, but I doubt that there are any standard implementations of this for
UNIX.  At best, they're probably experimental.  At worst, they don't exist.
There are a lot of complicated issues involved in putting multicast support
into IP.

Of course, as stated before, you'd have to whack on the networking code
quite a bit to teach it about multicast addresses, anyway.

> Can it be done?

Maybe, if you're willing to learn a lot about IP, Ethernet, and the 4.2BSD
networking code, and make *lots* of changes to it.  I don't guarantee that
it'd be possible even then.

> When "wall" does a broadcast - is it simultaneous to all nodes or
> consecutive?

As I mentioned above, "wall" doesn't do broadcasts at all; since "rwall"
doesn't do them as Ethernet broadcasts, they are consecutive.  (No, "rwall"
doesn't fork off N processes, one per machine.)

> What kind of error checking do I have to do for testing that
> the program was successfully transmitted without losing packets
> or corrupting packets - at source or destination?

Lots.  TCP doesn't understand broadcasts, much less multicasts, and can't
really be made to.  As such, you'd have to provide your own flow control and
error recovery.

As Charles Hedrick pointed out, this won't work well (if at all) if you want
to update hosts that aren't on the same Ethernet, either.

I think the best advice is "try something else".
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

sjl@amdahl.UUCP (Steve Langdon) (08/30/86)

In article <6527@sun.uucp> guy@sun.uucp (Guy Harris) provides his normal
clear explanation of many of the issues involved in trying to use multicast
to distribute files.  His explanatation focused on the issues involved when
you tried to solve the problem above the Data Link Layer (or LLC in 802
terminology).  A general architectural approach to multicast would be nice,
but you might find something useful in the current IEEE 802.1 work on load
protocols.  It is limited to use on a single LAN and was (last time I checked)
planned to support multicast.  I cannot provide any further details on how
they have designed the protocol because it has never been high on my priority
list.  

If my suggestion leads you to a workable protocol, Guy is still right about
the type of skills you will need to use it.  Expect to learn more than you
want to know about the messy details of the system, hardware, and LAN.
-- 
Steve Langdon  ...!{decwrl,sun,hplabs,ihnp4,cbosgd}!amdahl!sjl  +1 408 746 6970

[I speak for myself not others.]

garyf@mc0.UUCP (gary friedman) (09/05/86)

In article <6@cvbnet.uucp> acrotty@cvbnet.uucp (Art Crotty) writes:
>
>I would like to have the ability to transfer large application
>programs to all nodes on the network simultaneously.

The short answer to your question is you *can* broadcast your updated
programs to your other nodes, but you shouldn't.  The reason for this
will take some explaining. 
	The Ethernet protocol you said you had, TCP/IP/UDP, are
actually 2 seperate protocols that can co-exist harmoniously: TCP/IP,
which will guarantee packet delivery to one node only, and UDP, which
guarantees nothing.  One of UDP's features is, since it transmits
packets without waiting for any kind of acknowledgement, it is able to
send to a special broadcast address and have 'billions and billions'
of machines (which are also set to receive with this same broadcast
address) receive them without the overwhelming overhead that would
otherwise be required in such a case.  Many erroneously equate "UDP"
with "Broadcast", when in fact "Broadcast" is merely a special case.
	As you can probablly guess, if you choose to broadcast your
updates to all your Sun workstations, you run the risk of randomly 
dropping packets or losing bits of information in other ways.  This
risk is even greater if the other Suns are transmitting information to
each other (Using TCP/IP, no doubt) in the background at the same
time.  An example: In my studies of UDP reliablilty, it was common for
a Sun3 to send 100 UDP packets and have a Sun2 receive only 65 of
them.  (This result is amplified by the fact that the Sun3 sends them
faster than the Sun2 can physically receive them.  Sun2 to Sun2
generally yields better than 98% of the message when lots of other
Ethernet activity is taking place.)
	My reccommendation is to use NFS, as it was designed for
precisely your situation.  (The original posting didn't state why the
option was ruled out.)  If that option isn't acceptable, the next best
option is to write a shell that sequentially rcp's the file to every
node individually.  (RCP uses the TCP/IP protocol; it's no dummy!)
	Sorry about that---and good luck.


-- 

Gary Friedman
Jet Propulsion Laboratory
UUCP: {sdcrdcf,ihnp4,bellcore}!psivax!mc0!garyf
ARPA: ...mc0!garyf@cit-vax.ARPA

guy@sun.UUCP (09/08/86)

> (Explanation that TCP guarantees packet delivery, but does not support
> broadcast, while UDP supports broadcast but doesn't guarantee packet
> delivery.)

> 	As you can probablly guess, if you choose to broadcast your
> updates to all your Sun workstations, you run the risk of randomly 
> dropping packets or losing bits of information in other ways.

Well, you *could* have the receiving hosts send back acknowledgments when
they received the broadcast packets.  This would be an excellent way to melt
down an Ethernet, though, given the number of hosts that would receive the
broadcast packet.  The sending host would also have to know *all* the hosts
the broadcast would go to, in order to know whether it got all the
acknowledgments it should.  It would also have to know what to do if it
didn't get acknowledgments from all the hosts; should it retransmit only to
the hosts that didn't get it (if 75% of them didn't get it, this could flood
the Ethernet) or to all the hosts.  In addition, the code on the receiving
end would have to be able to deal with packets received out of order, or
duplicate packets (especially if the response to negative or missing
acknowledgments is a broadcast retransmission).

> 	My reccommendation is to use NFS, as it was designed for
> precisely your situation.  (The original posting didn't state why the
> option was ruled out.)  If that option isn't acceptable, the next best
> option is to write a shell that sequentially rcp's the file to every
> node individually.  (RCP uses the TCP/IP protocol; it's no dummy!)

And NFS uses UDP/IP.  NFS operations return a success/failure indication
and, if a failure, an error code; this return message acts as the
acknowledgment.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)