[comp.os.research] Maintaining monotonicity in a distributed system

martin@felix.uucp (Martin McKendry) (04/19/88)
If I had an acceptable approach to the following problem, I could
build a lot simpler algorithms at the next layer up.  Maybe someone
has thought about this before.

I have a distributed system, typically 20-40 machines, but it can 
be as high as 200 and as low as 1 (or zero).  Some of these machines
have disk, but most do not.  I need to keep a single integer
for the entire system that increases monotonically.  Call
this an 'epic counter'.  The purpose of the epic counter is to 
control various kinds of versions -- crash counts & versions
of a recovery/watchdog mechanism, in particular. 

The catch is that I don't want to use any permanent storage to do
this.  And I don't want to depend on any particular machine, or
be 'n-way' fault tolerant, for any value of 'n'.   It must always
be possible to get a new value of the epic counter issued (e.g.,
to 'timestamp' this re-boot of a workstation).  Even if there
is only one station up, this must be possible.  I'm open to the
semantics in the case of partitions.  If we ever get all stations
down, it is OK for the first one up to restart the counter (but not
if he is merely partitioned temporarily).  

I'm looking at various forms of election, etc., but I have the
added constraint that there cannot be any rigid notion of a quorum --
we have to work out 'who is up' by timeouts.  We do know the maximal
set (at least right now), but I'd prefer not to use that.

I think it unlikely that we are the first to encounter this problem.

Any suggestions or pointers?  Maybe Knuth Book 15, Chapter 9, Page 13598
has a solution?

--
Martin S. McKendry;    FileNet Corp;	{hplabs,trwrb}!felix!martin
Strictly my opinion; all of it