[comp.sys.isis] Question on ISIS scalability

ken@gvax.cs.cornell.edu (Ken Birman) (09/13/90)

> From mail to me:
>  We are planning on implementing a large scale distributed enviorment,
>using the linda symatics (our own run-time implementation).  I would like
>to implement this on top of isis since many of the features of isis would
>be very very useful.  What I need to know is how well isis can scale.  We
>Would like to have 100+ machines working together.  In terms of an isis
>implementation, this would entail 5-20 machines running isis with each
>machine having 5-20 machines talking to it via isis_remote.
>  In terms of communications, there will be a large number of "small" <1k
>broadcasts to all machines, and a large number of transfer of unknown sized
>data directly between two machines.  What I need to know is how well isis
>can handle this number of machines in this type of configuration.

>                                        /Bob...
>{...}!rutgers!mende         mende@cs.rutgers.edu          mende@zodiac.bitnet

The system should be able to handle this now and planned enhancements
will make this a very reasonable configuration in the V3.0 release.

What you have in mind would work well under ISIS V3.0 using "pg_client"
in bypass mode (which will be the default in V3.0).  You would want
to work towards an architecture with a small number of active
"servers" in each process group but as many "clients" as you like.
You would presumably use a structure in which some small number of
processes implement each Linda tuple-space (2-3, say) and clients
communicate with these.  I can think of a number of engineering choices
balancing degree of fault-tolerance against cost.  

V3.0 will have several modes in which such a group can run: diffusion
(all clients get the multicast) or client-server (multicast goes to
(server+1 client).  Plus, of course, the current options: multicast
just within the server, RPC to a favorite server, etc.

Most of these modes can be simulated in V2.1 with multiple overlapping
groups, but it isn't necessarily easy.  Let me know if you need help.

The other issue here concerns the cost of isis_remote.  In V3.0
we will probably support a version of isis_remote that runs at roughly
the same speed as isis_init.  Right now, there is a definite but
small performance penalty, which is noticable in non-bypass 
applications but not often seen if bypass is enabled.  Hard to say if
this will pose a problem under V2.1 because you haven't indicated 
just what performance requirements you have.

As for message sizes, in bypass mode ISIS does pretty well for all
sizes and usually outperforms TCP.  However, you will need to be
sparing in your use of multicast.  Specifically, even though ISIS
is good at multicast, try to avoid sending to large numbers of 
processes unless you gain something that justifies the cost.
If some process gets a copy of a message, you will want to be sure
that it actually needed the message.

A final comment relates to V2.1.  As noted in a previous posting,
V2.1 is configured with a small limit on the size of process groups.
You may need to increase this if you want to get started under V2.1,
but be aware that the bypass protocol is missing an optimization
(to be added in V3.0) and because of this, bypass group membership
changes get kind of sluggish with groups containing, say, 30 members or
more.  So, under V2.1, you can recompile with a larger limit but not
if you also plan to use bypass communication.
In V2.1 you could get this to work but would see poorer performance

... Ken
Ken