mrose@cheetah.nyser.net (Marshall Rose) (07/10/89)
You raise some good issues about representations and algorithms. Whilst I have never been a particularly big fan of the in-memory approach used by QUIPU, let me explain why I think your comparison is somewhat off the mark. If the problem was scanning textual information kept in a single file on a local disk for regular expressions, then I would simply use grep. Your figures on grep's performance demonstrate why that is the correct approach for that problem. Unfortunately, that's not the problem given to the people who wrote the Directory spec. There were a few more constraints that had to be accommodated: - information must be distributed on different systems, potentially geographically distant - information must be hierarchical in nature, primarily to support autonomous control over parts of the information (of course there is no explicit mapping between the hierarchy and the location of the information). - arbitrary binary information, in addition to textual information, must be accommodated; each type of data may have its own searching and comparison characteristics - information must be updatable (add, modify, delete) - access to information must also support a remote comparison paradigm, when the owner of the information doesn't want it to leave their system (e.g., passwords) There are probably some others, but these are the ones which jump out at me. Whilst grep is very good in the test case you devised, I don't think it would work if it had to deal with these five other handicaps. Just in passing, I'll note that five years ago, a similar argument might have been made about the DNS--why not use grep over /etc/hosts instead of sending queries to who-knows-where. (Of course, I'm not comparing the DNS and the Directory here, but it does make a rather interesting example!) Nonetheless, your criticism of QUIPU's in-core strategy remains. One of the things I am interested in studying in the pilot project is, in practice, how much this really becomes a problem. There is this theory that a decent whitepages user interface exploits the hierarchy of the naming architecture to perform more intelligent searchs, thus reducing the exposure of this in-core business. I myself wouldn't structure an organization such that it had 16K users at a single level--either in a database or in real life, so I'm interested in finding out what the limits of this theory are. By the way, you should note that the statement gnu-grep can always scan ... using arbitrary regular expressions which are more powerful than those in X.500 is not strictly true. While regexps are more powerful than the modest wildcarding facilities in the Directory, the Directory has the concept of approximate matching, which grep does not. In QUIPU, for example, an approximate match of a surname data type involves applying a soundex algorithm. (I don't think the GNU folks have added that option to gnu-grep ... yet!) /mtr