macrakis@uunet.uu.net (Stavros Macrakis) (12/20/88)
To make extension easy, it would be useful to have a built-in universal hash function for Emacs-Lisp objects. I note that there already exists a string-hash (not Lisp callable). Below, a sketch. -s Some other comments: for extension, it would be useful to have the following built-in Lisp functions: maknum, munkam (dangerous but sometimes handy) until (like while but test at the end) Requirement A hash function maps from some class of objects to some range of the integers. Hash functions are useful for: 1. quick tests for equality; 2. indexes (hash tables). So that hash values can be transmitted between program invocations, the hash value of an object should not depend on its storage characteristics. It should depend on all the user-visible properties of the object and nothing else. It is not clear how to hash things like buffers, whose value is usually of less interest than their identity. Perhaps this is an appropriate place to use maknum. Certain applications might find it useful that the hash value be machine-independent, but if this is difficult, it can be ignored. Specification The hash value is defined recursively. The existing string hash looks reasonable. Symbol hash can be string hash + a constant. Sequences' hashes should not only be order-dependent, but should also have algebraic properties that make it unlikely that different structures consisting of the same atoms hash to the same number (e.g. ((a a) a a) vs. (a ((a)) a a). For instance, if we hash [a1,a2,a3,...] internally as H = ((h(a1)+k0)*k+h(a2))*k+h(a3)..., then I suggest that hash return H^2. Experience and a certain number of tests show that this works well. Design In order to avoid stack overflow, the hash function should probably not recurse down the cdr of lists--it should iterate. Code Something like: (defun hash (x) (cond ((null x) knull) ((integerp x) (+ x kinteger)) ((symbolp x) (+ 345 (hash (symbol-name x)))) ((consp x) (let ((res k0)) (until (not (atom x)) (setq res (+ (* k res) (hash (car x))) x (cdr x))) (+ (* res res) kcons))) ((stringp x) ... ... No two constants should be equal, none should be zero. They should be odd, have at least three bits = 1, and should probably be prime.