[gnu.emacs.bug] Extension facilities

macrakis@uunet.uu.net (Stavros Macrakis) (12/20/88)

To make extension easy, it would be useful to have a built-in
universal hash function for Emacs-Lisp objects.  I note that there
already exists a string-hash (not Lisp callable).  Below, a sketch.
	-s

Some other comments: for extension, it would be useful to have the
following built-in Lisp functions:
	maknum, munkam (dangerous but sometimes handy)
	until (like while but test at the end)



Requirement

A hash function maps from some class of objects to some range of the
integers.

Hash functions are useful for:
  1. quick tests for equality;
  2. indexes (hash tables).

So that hash values can be transmitted between program invocations,
the hash value of an object should not depend on its storage
characteristics.  It should depend on all the user-visible properties
of the object and nothing else.

It is not clear how to hash things like buffers, whose value is
usually of less interest than their identity.  Perhaps this is an
appropriate place to use maknum.

Certain applications might find it useful that the hash value be
machine-independent, but if this is difficult, it can be ignored.

Specification

The hash value is defined recursively.
The existing string hash looks reasonable.
Symbol hash can be string hash + a constant.
Sequences' hashes should not only be order-dependent, but should also
have algebraic properties that make it unlikely that different
structures consisting of the same atoms hash to the same number (e.g.
((a a) a a) vs. (a ((a)) a a). For instance, if we hash [a1,a2,a3,...]
internally as 
	H = ((h(a1)+k0)*k+h(a2))*k+h(a3)..., 
then I suggest that hash return H^2.  Experience and a certain number
of tests show that this works well.

Design

In order to avoid stack overflow, the hash function should probably
not recurse down the cdr of lists--it should iterate.

Code

Something like:

(defun hash (x)
  (cond ((null x) knull)
	((integerp x) (+ x kinteger))
	((symbolp x) (+ 345 (hash (symbol-name x))))
	((consp x)
	 (let ((res k0))
	   (until (not (atom x))
	     (setq res (+ (* k res) (hash (car x)))
		   x (cdr x)))
	   (+ (* res res) kcons)))
	((stringp x) ...
	...

No two constants should be equal, none should be zero.  They should be
odd, have at least three bits = 1, and should probably be prime.