[comp.protocols.tcp-ip] quipu as X.500 server

mrose@cheetah.nyser.net (Marshall Rose) (07/10/89)

You raise some good issues about representations and algorithms.
Whilst I have never been a particularly big fan of the in-memory
approach used by QUIPU, let me explain why I think your comparison is
somewhat off the mark.

If the problem was scanning textual information kept in a single file on
a local disk for regular expressions, then I would simply use grep.
Your figures on grep's performance demonstrate why that is the correct
approach for that problem.

Unfortunately, that's not the problem given to the people who wrote the
Directory spec.  There were a few more constraints that had to be accommodated:

	- information must be distributed on different systems,
	  potentially geographically distant

	- information must be hierarchical in nature, primarily to
	  support autonomous control over parts of the information (of course
	  there is no explicit mapping between the hierarchy and the
	  location of the information).

	- arbitrary binary information, in addition to textual information,
	  must be accommodated; each type of data may have its own searching
	  and comparison characteristics

	- information must be updatable (add, modify, delete)

	- access to information must also support a remote comparison
	  paradigm, when the owner of the information doesn't want it to
	  leave their system (e.g., passwords)

There are probably some others, but these are the ones which jump out at
me.  Whilst grep is very good in the test case you devised, I don't
think it would work if it had to deal with these five other handicaps.
Just in passing, I'll note that five years ago, a similar argument might
have been made about the DNS--why not use grep over /etc/hosts instead
of sending queries to who-knows-where.  (Of course, I'm not comparing
the DNS and the Directory here, but it does make a rather interesting
example!)

Nonetheless, your criticism of QUIPU's in-core strategy remains.  One of
the things I am interested in studying in the pilot project is, in
practice, how much this really becomes a problem.  There is this
theory that a decent whitepages user interface exploits the hierarchy of
the naming architecture to perform more intelligent searchs, thus
reducing the exposure of this in-core business.  I myself wouldn't
structure an organization such that it had 16K users at a single
level--either in a database or in real life, so I'm interested in
finding out what the limits of this theory are.

By the way, you should note that the statement

	gnu-grep can always scan ... using arbitrary regular expressions
	which are more powerful than those in X.500

is not strictly true.  While regexps are more powerful than the modest
wildcarding facilities in the Directory, the Directory has the concept
of approximate matching, which grep does not.  In QUIPU, for example, an
approximate match of a surname data type involves applying a soundex
algorithm.  (I don't think the GNU folks have added that option to
gnu-grep ... yet!)

/mtr