ken@gvax.cs.cornell.edu (Ken Birman) (09/13/90)
> From mail to me: > We are planning on implementing a large scale distributed enviorment, >using the linda symatics (our own run-time implementation). I would like >to implement this on top of isis since many of the features of isis would >be very very useful. What I need to know is how well isis can scale. We >Would like to have 100+ machines working together. In terms of an isis >implementation, this would entail 5-20 machines running isis with each >machine having 5-20 machines talking to it via isis_remote. > In terms of communications, there will be a large number of "small" <1k >broadcasts to all machines, and a large number of transfer of unknown sized >data directly between two machines. What I need to know is how well isis >can handle this number of machines in this type of configuration. > /Bob... >{...}!rutgers!mende mende@cs.rutgers.edu mende@zodiac.bitnet The system should be able to handle this now and planned enhancements will make this a very reasonable configuration in the V3.0 release. What you have in mind would work well under ISIS V3.0 using "pg_client" in bypass mode (which will be the default in V3.0). You would want to work towards an architecture with a small number of active "servers" in each process group but as many "clients" as you like. You would presumably use a structure in which some small number of processes implement each Linda tuple-space (2-3, say) and clients communicate with these. I can think of a number of engineering choices balancing degree of fault-tolerance against cost. V3.0 will have several modes in which such a group can run: diffusion (all clients get the multicast) or client-server (multicast goes to (server+1 client). Plus, of course, the current options: multicast just within the server, RPC to a favorite server, etc. Most of these modes can be simulated in V2.1 with multiple overlapping groups, but it isn't necessarily easy. Let me know if you need help. The other issue here concerns the cost of isis_remote. In V3.0 we will probably support a version of isis_remote that runs at roughly the same speed as isis_init. Right now, there is a definite but small performance penalty, which is noticable in non-bypass applications but not often seen if bypass is enabled. Hard to say if this will pose a problem under V2.1 because you haven't indicated just what performance requirements you have. As for message sizes, in bypass mode ISIS does pretty well for all sizes and usually outperforms TCP. However, you will need to be sparing in your use of multicast. Specifically, even though ISIS is good at multicast, try to avoid sending to large numbers of processes unless you gain something that justifies the cost. If some process gets a copy of a message, you will want to be sure that it actually needed the message. A final comment relates to V2.1. As noted in a previous posting, V2.1 is configured with a small limit on the size of process groups. You may need to increase this if you want to get started under V2.1, but be aware that the bypass protocol is missing an optimization (to be added in V3.0) and because of this, bypass group membership changes get kind of sluggish with groups containing, say, 30 members or more. So, under V2.1, you can recompile with a larger limit but not if you also plan to use bypass communication. In V2.1 you could get this to work but would see poorer performance ... Ken Ken