[comp.sys.isis] timeservice example in Chapter 1

adc@cpsvax.cps.msu.edu (Alan Cabrera) (06/10/89)

I typed in and compiled the timeservice example from Chapter 1 of the
ISIS System Manual.  I started up 11 of the service tasks.  The problem
is that my querying program loops forever because isis_nreplies = 5
while isis_nsent = 11.  See page 11 for relevant code.  Am I doing
something wrong?


Alan Cabrera
adc@cpsvax.cps.msu.edu

ken@gvax.cs.cornell.edu (Ken Birman) (06/10/89)

In article 21 of comp.sys.isis, adc@cpsvax.cps.msu.edu (Alan Cabrera) writes...

> I typed in and compiled the timeservice example from Chapter 1 of the
> ISIS System Manual.  I started up 11 of the service tasks.  The problem
> is that my querying program loops forever because isis_nreplies = 5
> while isis_nsent = 11.  See page 11 for relevant code.  Am I doing
> something wrong?

The problem is in the example, which has a bug when the number of
servers exceeds the number of departments (NDEPT, 5 in this case).

The way that the example is set up, if there are more than 5 servers
the "extras" do nullreply() calls.  This is because the extra servers
have no real data to contribute, but the sender wanted ALL replies
and would otherwise be blocked waiting for a reply regardless.

Unfortunately, for "fault-tolerance" the sender loops if it gets fewer
replies than expected, which it expresses as a test:
	while(isis_nsent > isis_nreplies) ...
the bug is that this test doesn't catch the case where some replies
are certain to be null (null replies don't count in isis_nreplies).
So, the test really should have been
	while(min(isis_nsent, NDEPTS) > isis_nreplies) ...
I'll fix the documentation -- thanks for pointing this out.

By the way, I find it a bit inelegant to code the loop this
way, because I think of the above "algorithm" as being
something the sender should not need to know about -- the
business about getting NDEPTS answers but having more than
NDEPTS servers seems to be internal to the server to me, and
this loop makes it explicit.

For this reason, I would recommend that people instead send back
the value of NDEPTS or whatever as part of the reply -- basically,
send reply "1 of 3, 2 of 3, ..." and check to see that you got
all 3 of 3 parts.  The advantage is that the interface then
fully documents the behavior of the server.  The danger in the
above code is that even if some header claims that NDEPTS is 5,
the currently active copy of the server could have been recompiled
with NDEPTS 4, and my "fix" would then loop just as in your experience.
The "1 of 4" approach would be correct, on the other hand.

This raises the whole question of how to move ISIS into a more
object-oriented environment, where interfaces might contain
more details about the correct way to access an object and perhaps
even the right algorithm for invoking it.  I would be very
interested in ideas on this -- we get asked about it a lot,
and it certainly seems like an interesting issue.  One approach,
that I sort of like, is to do what Shapiro's SOR system does --
it actually exports code fragments which are used to access the
object (in this case, the loop for calling the service would be code
that belongs to the server object -- not the caller).  But, this 
is hard to do in heterogeneous settings, and requires a form
of dynamic binding -- plus it raises many protection issues.

Does anyone see a clever way to address this?

Ken