[comp.protocols.tcp-ip] in re:quipu as X.500 server

cpj@ENG.SUN.COM (Chuck Jerian) (07/05/89)

I took a large list of names, and set up the QUIPU server discussed
in the earlier item.  This name server used over 1K of memeory per
name in the list, so that to store the large list of people and machines
that I made of everyone at Su, it used over 16M of virtual memory.
It seemed to reference almost 32M making this list, but freed 
about half of the total.  The data is organized as a giant linked list
within a level of the directory.  A search of the directory using the
ISO search mechanism which allows for 'x*y*z' causes the server to
violently page trash, as it references every page of this giant list.
An answer is sometimes forthecoming in a minute.  The QUIPU server
is better behaved for small sets of names.  

On the other hand gnu-grep can always scan this same list of data
using arbitrary regular expressions which are more powerful than 
those in X.500 in less than .3 seconds on a Sun4/260 with the
data represented as a text file.

This suggests to me that the most important issue in searching name
servers is the organization of data and the choice of algorithms,
those in QUIPU are terrible, much worse than text files and gnu-grep.
					--cpj

steve@CS.UCL.AC.UK (Steve Kille) (07/07/89)

I suggest that we move this discussion to the <quipu@cs.ucl.ac.uk> list,
which is focussed on the issue.  This is an open list, send to
<quipu-request@cs.ucl.ac.uk> of you want to join.  If you have problems with
installing or using quipu, please send reports to
<quipu-support@cs.ucl.ac.uk>.  



 >From:  Chuck Jerian <cpj@eng.sun.com>
 >To:    tcp-ip@sri-nic.arpa
 >Subject: in re:quipu as X.500 server
 >Date:  Tue, 4 Jul 89 12:22:54 PDT

 >I took a large list of names, and set up the QUIPU server discussed
 >in the earlier item.  This name server used over 1K of memeory per
 >name in the list, so that to store the large list of people and machines

Right.  There is a lot of structuring info needed.  I guess that your
entries don't have much data, as we estimate an average of 2k per entry.
There is quite a bit of optimisation possible without too much effort, which
we expect to do for QUIPU 6.0.

 >that I made of everyone at Su, it used over 16M of virtual memory.
 >It seemed to reference almost 32M making this list, but freed 
 >about half of the total.  

Are you sure?  This surprises me.

 >The data is organized as a giant linked list
 >within a level of the directory.  A search of the directory using the
 >ISO search mechanism which allows for 'x*y*z' causes the server to
 >violently page trash, as it references every page of this giant list.
 >An answer is sometimes forthecoming in a minute.  The QUIPU server
 >is better behaved for small sets of names.  

This is what I would expect.  If you search the entire tree, you are going
to touch bits all over the virtual memory used.  If this can all fit into
real memory, performace is reasonable (about 1000 entries per second, for
Vaxstation II - quite a bit slower than the machine you quote).  If you step
off real memory, it thrashes (as you note).  In many cases, the X.500
Directory Information Tree hierarchy will be used to control scope of search.
We also plan to make some changes so that common searches touch less memory
(e.g., by grouping all the phone data onto adjacent memory).

 >On the other hand gnu-grep can always scan this same list of data
 >using arbitrary regular expressions which are more powerful than 
 >those in X.500 in less than .3 seconds on a Sun4/260 with the
 >data represented as a text file.

grep is a good tool, and has its applications.  However, supporting a wide
area directory is not one of them.

 >This suggests to me that the most important issue in searching nam >e
 >servers is the organization of data and the choice of algorithms,

Absolutely.   Seems like motherhood to me!

 >those in QUIPU are terrible, much worse than text files and gnu-grep.
 >					--cpj

This does not follow.   QUIPU can do a lot of things which gnu-grep can't!
Let me explain some of the philosophy behind why QUIPU was done the way it
was.  The OSI Directory (X.500) has a very rich (too rich?) framework.   
One design approach is to chooes your database, and then provide X.500
access to it.  For example, I could choose gnu-grep + single file as my
database, and then give X.500 access.  This would give stunning performance
for the things my datanbase was good at, but would fall to pieces for things
keyed the wrong way, or questions which could not be formulated.

With QUIPU, the internal (memory) structures are aligned very much to that
of the OSI directory.  This means that a query will cost about in proportion 
to the complexity of the query in X.500 terms.  This means that it will not
be stunningly good for any sort of query.  However (and more important in an
experimental implementation) it will not be stunningly bad for any sort of
query.  One of the things I'd hope to learn from the QUIPU experiment is how
one might key a database, and which are the things that need to be optimised
for under "real" usage.   




Steve