adc@cpsvax.cps.msu.edu (Alan Cabrera) (06/10/89)
I typed in and compiled the timeservice example from Chapter 1 of the ISIS System Manual. I started up 11 of the service tasks. The problem is that my querying program loops forever because isis_nreplies = 5 while isis_nsent = 11. See page 11 for relevant code. Am I doing something wrong? Alan Cabrera adc@cpsvax.cps.msu.edu
ken@gvax.cs.cornell.edu (Ken Birman) (06/10/89)
In article 21 of comp.sys.isis, adc@cpsvax.cps.msu.edu (Alan Cabrera) writes... > I typed in and compiled the timeservice example from Chapter 1 of the > ISIS System Manual. I started up 11 of the service tasks. The problem > is that my querying program loops forever because isis_nreplies = 5 > while isis_nsent = 11. See page 11 for relevant code. Am I doing > something wrong? The problem is in the example, which has a bug when the number of servers exceeds the number of departments (NDEPT, 5 in this case). The way that the example is set up, if there are more than 5 servers the "extras" do nullreply() calls. This is because the extra servers have no real data to contribute, but the sender wanted ALL replies and would otherwise be blocked waiting for a reply regardless. Unfortunately, for "fault-tolerance" the sender loops if it gets fewer replies than expected, which it expresses as a test: while(isis_nsent > isis_nreplies) ... the bug is that this test doesn't catch the case where some replies are certain to be null (null replies don't count in isis_nreplies). So, the test really should have been while(min(isis_nsent, NDEPTS) > isis_nreplies) ... I'll fix the documentation -- thanks for pointing this out. By the way, I find it a bit inelegant to code the loop this way, because I think of the above "algorithm" as being something the sender should not need to know about -- the business about getting NDEPTS answers but having more than NDEPTS servers seems to be internal to the server to me, and this loop makes it explicit. For this reason, I would recommend that people instead send back the value of NDEPTS or whatever as part of the reply -- basically, send reply "1 of 3, 2 of 3, ..." and check to see that you got all 3 of 3 parts. The advantage is that the interface then fully documents the behavior of the server. The danger in the above code is that even if some header claims that NDEPTS is 5, the currently active copy of the server could have been recompiled with NDEPTS 4, and my "fix" would then loop just as in your experience. The "1 of 4" approach would be correct, on the other hand. This raises the whole question of how to move ISIS into a more object-oriented environment, where interfaces might contain more details about the correct way to access an object and perhaps even the right algorithm for invoking it. I would be very interested in ideas on this -- we get asked about it a lot, and it certainly seems like an interesting issue. One approach, that I sort of like, is to do what Shapiro's SOR system does -- it actually exports code fragments which are used to access the object (in this case, the loop for calling the service would be code that belongs to the server object -- not the caller). But, this is hard to do in heterogeneous settings, and requires a form of dynamic binding -- plus it raises many protection issues. Does anyone see a clever way to address this? Ken