[comp.sys.isis] How to find rank of server w.r.t. bcast messages

rfinch@caldwr.water.ca.gov (Ralph Finch) (08/04/90)

We wish to more or less use the parallel computing concept as outlined
in the Isis manual (the fft example), but instead of rank in the group
we wish to use rank according to the order the servers receive the
bcast message.  Is this possible in an easy fashion (Isis call)?

ken@gvax.cs.cornell.edu (Ken Birman) (08/04/90)

In article <219@newhope.water.ca.gov> rfinch@caldwr.water.ca.gov
	(Ralph Finch) writes:
>
>We wish to more or less use the parallel computing concept as outlined
>in the Isis manual (the fft example), but instead of rank in the group
>we wish to use rank according to the order the servers receive the
>bcast message.  Is this possible in an easy fashion (Isis call)?


If I understand the question, yes.  You will need to use abcast to send
to the servers, or use cbcast and have some "primary" server that assigns
rankings and periodically tells the others with a cbcast so that they know
what it picked.  As a rule of thumb, abcast is about half as fast as 
cbcast, so this second scheme is only worthwhile if you get such a high
volume of requests that a little delay before sending the ranking won't
impact performance.

Assuming that you just use abcast, the servers will see the requests in
the same order and will get them in the same "state" (i.e. group view, etc)
They can count requests and use this number as a ranking on the requests.
Everything else would be as in the example from the manual (well, maybe
it would be a good idea to count modulo the number of servers, if you
want the "ranking" to be in the range 0..nservers-1!)

If you use the cbcast scheme to reduce costs, the problem is that the
recipients won't see requests in the same order and hence can't use any
local rule to rank them.  This is why you would need a primary "ranker"
who would have to tell the others what ranking it picked for each request.

By the way, this is precisely how abcast works.  The abcast "ranking"
is just the order in which the messages should be delivered.

Life gets more complex if you want to do a uniform load-sharing scheme.
I assume this isn't quite what you have in mind, so I'll leave this as 
a puzzle for readers... (there seems to be a research topic here: although
there are many ways to solve this problem, no single solution jumps out as
obviously the one to use).

Ken

ken@gvax.cs.cornell.edu (Ken Birman) (08/06/90)

In article <219@newhope.water.ca.gov> rfinch@caldwr.water.ca.gov (Ralph Finch) writes:
>
>We wish to more or less use the parallel computing concept as outlined
>in the Isis manual (the fft example), but instead of rank in the group
>we wish to use rank according to the order the servers receive the
>bcast message.  Is this possible in an easy fashion (Isis call)?

Robert Cooper points out to me that Ralph probably had something else
in mind... the second interpretation is that he wants to program a
group of servers pretty much in the style that the Linda system supports:
bag of tasks, you add a task by multicasting your request, the first free
server removes the task and executes it.

This is fairly easy to support.  You do the following:  
	1) to add a task to the "bag" a client does an abcast.  All	
	   servers receive the task request in the same order and all
	   save it into the bag.
	2) when idle, a server abcasts a message "I'm idle" and awaits 1 reply
	3) on receipt of an idle message, all server handle it this way:
	   3.a) if there is a task in the bag, they remove it.  In the
                server that sent the "I'm idle" message a reply() is done,
                causing the computation to begin.  (Other servers don't
		need to send replies, obviously).
	   3.b) if the bag is empty, the message is added to a queue
	4) When a request is added the bag (1) and the idle-server queue is
	   non-empty (3.b), then algorithm (3.a) is executed for the first
	   "I'm idle" message on the queue.

The code is something like this:

snd_req:	abcast(group, WORK_REQ, ...., 1, "answ", ....)
get_req(mp)	qu_append(bag, mp)
		if(qu_head(idle_servers))
		{
		    idlemsg = de_queue(idle_servers);
		    server = msg)getsender(idlemsg);
		    if(addr_ismine(server)) reply(idlemsg, "%d", msg_getid(mp))
		}
snd_idle:	abcast(group, IM_IDLE, ...., 1, "%d", &msg_id_to_process);
get_idle(mp)	if(qu_head(bag))
		{
		    reqmp = de_queue(bag);
		    server = msg_getsender(mp);
		    if(addr_ismine(server)) reply(mp, "%d", msg_getid(reqmp))
		}
		else
		    qu_append(idle_servers, mp);

This assumes that idle_servers and bag are queues of messages (don't
forget to call msg_increfcount when queueing one and msg_delete when
done with it!) and that the operations to add a message are qu_append
and to dequeue (but not delete) one is de_queue.

Notes:
	1) This will be a little slower than a Linda implementation
	   because it uses multicasts, but unlike the Linda version the
	   data structures are replicated and hence the scheme is
	   much more fault-tolerant.  Need to think about whether you need
	   this.
	2) Could substitute cbcast for abcast with a token-based order
	   publication scheme.  Save this until you have reason to think
	   your bottleneck is the use of abcast instead of cbcast.
	3) Could use a coord-cohort computation to actually process messages.
	   This is slightly tricky to do right.

	   You would need to make a private copy of the routine "coordinator"
	   (see clib/cl_coord.c) and modify it to let you pick the first coord.
	   This will be the server picked in step (3.a)

	   To use your routine, on receipt of a request the servers would
		a) add it to the bag, as above
		b) call coord_cohort_l(...., your-coordinator-choser);
	   The choser would block until the "first coordinator" is known to
	   it, using a condition variable; when the coordinator-choice arrives
	   use t_sig to pass the news over to it.

	   Then it would do what coordinator() now does, but starting from
	   this idle server as the "prefered coordinator"

If I were just tossing something together I would go with abcast's but
seriously consider the coordinator choice issue.  The code to do this would
be pretty simple and the benefit is that you could use a coordinator-cohort
scheme biased to start with the server who is idle first for each request.

I can post a sample of code for this if people find it too confusing...

Ken