[comp.sys.isis] More musings about the YP service

ken@gvax.cs.cornell.edu (Ken Birman) (12/28/89)
The holiday seemed like a good time to think about how I would
actually go about implementing a hierarchical YP server with imports
and exports, but when I started to do this, I was struck by a parallel
that I want to outline for you.  Hopefully, the 4 or 5 people would have
been emailing privately will be drawn to post some public comments --
I am a bit nervous about posting private email unless people ask me to.

Anyhow, this is the line of reasoning I was pursuing.  We all know how
easy it is to do replicated updates using ISIS; in the V2.0 release
the performance is close to the hardware limits.  So, the real
issue seems to be what the data structure used by YP ought to look like.

Recall that YP basically provides queries on files (/etc/hosts, /etc/services,
etc.)  These will get big, so they are probably going to be stored in
real physical files.  Since simplicity is a big win, lets assume that such
files are carved into "segments" of length, say, 1k or 8k.  This lead me to
think of the YP program as a file system cache smart enough to run
queries directly rather than ship you the data just so you can run some
simple select operation on your machine.  

But, if YP is just a smart file system cache, maybe the real problem
is to build a replicated file system cache facility?  Not only that, but
the general ability to run queries within a file system segment server/cacher
could have a big payoff: for one thing, a database could use this to search
index structures...

Now, here's the twist: Keith Marzullo and I have a student, Alex Siegel,
who has been working on exactly this problem!  I hadn't seen the connection
to YP at first, but now it seems to me that Alex's file system (Deceit)
basically has the architecture of the YP facility we are after!  And,
NFS "mounts" (automatic, in the new versions of NFS) seem like a very
simple way to model the hierarchical aspect of the YP problem:
	/cornell/etc/hosts	/gnu/etc/hosts	 ....
The idea is that /cornell is an "automount" point; if you are a GNU
site and you reference /cornell/etc/hosts, the system automatically mounts
/cornell, and you end up with the copy of /etc/hosts that is "local"
when viewed from the Cornell perspective.  YP would now be a package
of file search software that access these files...

The rub is that /etc/hosts might be very big, and we want to avoid
an excessive communication overhead.  So, we really need a way to send
the "search pattern" to the YP service and just get back the part of the
file that matches.  This way the work of doing the search would be
pushed into the server, which could run it on a machine that actually
has the physical data local to it.

Alex's architecture is remarkably close to what we seem to be after.
He uses a segment server mechanism to provide access to NFS systems
on which the data resides, and basically acts like an NFS (like a UNIX
or POSIX file system, if you prefer to think of it that way) for most
clients.  However, he doesn't currently have a way to ask the server to
do any sort of fancy data selection.  So, the questions left are:

1) For our "homework" problem, what is the best way to implement a read-mostly
   segment caching scheme using ISIS?

   We'll get Alex to tell us what he is doing later...  and what bottlenecks
   arise from a performance perspective.

2) More generally, what is the best way to augment the segment server to
   do simple queries locally so that it can support YP, but also be a
   useful accelerator for other types of database access -- after all, YP
   is basically a distributed database system.

Deceit actually exists, so the upshot of all this may be that by
next year we will actually have the YP system running this way...

Ken

PS: When Alex returns from vacation, I'll ask him to post something on
Deceit.  A TR is available and you should get a copy soon if you are on
our TR mailing list.  If not, request it from croft@cs.cornell.edu
("Architecture of the Deceit File System, by Siegel, Birman and Marzullo)