[comp.theory] Hashing help needed

terryn@uop.EDU (Terry Ng) (07/16/90)

    I have been trying to figure out a hashing algorithm to combine the
 measurements of a piece of wood.  I have the thickness, width, and length.
 This is all measured in inches and fractions thereof and converted to its
 decimal equivalent.  I need an algorithm to uniquely combines these almost
 infinite measurements into a 4 digit code.  I say almost infinite because
 all measurements will not excede 999.99 (separately).

    I would appreciate any help.  I would also welcome suggestions of any
 kind.  I have been playing around with different schemes of my own but have
 not got anywhere.  If it is not possible to obtain a 4-digit code maybe
 a 5 or more digit code may be used.  I don't want to get to the point of
 just concatinating all of the digits together.  That would defeat the
 purpose of a hash.

 *******************************************************************************
 *                                                                             *
 * Email: terryn@uop.edu                                                       *
 *                                                                             *
 *******************************************************************************

edp@jareth.enet.dec.com (Eric Postpischil (Always mount a scratch monkey.)) (07/17/90)

> I need an algorithm to uniquely combines these almost
>  infinite measurements into a 4 digit code.  I say almost infinite because
>  all measurements will not excede 999.99 (separately).

No matter what hash scheme you use, there are only 10,000 different
4-digit codes, so 4-digit codes can only encode 10,000 different sets of
measurements.

If each of your three measurements can have a hundred values, that's a
million different
combinations, and you cannot fit them into 4 digits.  Nor into 5.


				-- edp

phmb@otter.hpl.hp.com (Peter Brooks) (07/17/90)

You have been advised of the impossibility of your request - quarts do
not fit into pint pots, however it is probable that you knew this to
begin with and wish to get the most even spread, or to state the same
condition you wish to have the mimimum number of synonyms. The best way
I know is to use a modified form of Godel numbering ie, produce the
Godel number for your bit of wood and take its modulus within the
bound that you wish to constrain yourself to. How you then resolve
synonym clashes is your affair. In detail:

For your bound (4 digits -> 9999) take the largest prime number, p1, in
less than this. For each dimension, in your case you have three length
width and height, take a prime say p1,p2,p3 in your case. To find your
Godel number G = p1^l * p2 ^ w * p3 ^ h. For integral values of l,w and
h this number is unique. Now to compress this into your range you simply
get your number N = G mod p1. Now for practical measures, you may need
to use the final few digits of your length as otherwise the numbers
become a little large.

Eg. for the range 100, p1 = 97 your bit of wood measures l = 1828.8 mm
by w = 101.6 mm by h = 50.8 mm. for p1 = 2, p2 = 3, p3 = 5 we have
G = 2^288 * 3 ^ 1016 * 5 ^ 508 = 
337750304396705098571122337365999043230482501155685524045168492026139\
7668939363882432430791175603518883885207439163202131321120617573667213\
2410980123910106034401038053639481089220571003629510159559729945385280\
1973536128209216268413114332710640657298341500372170668411203125406938\
6573824240441039039226095004052247924130770443230816718591439248450820\
6784631181023079540800465370207294607439644943551360475114866435096689\
0238356410112741963844888055894318871075292079500904825197270438909441\
2143891224620408901314362522786931325528467005500144923988970790346871\
5326004198319743050965807687775094866975550411680728757346514612436294\
5556640625000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000000000000000000000000\
000000000000000000
G mod 97 = 64
The spread effect is easily seen if we change one measure by 0.1 mm
if w = 101.7 mm the answer is 95 and if h is 50.7 it is 71.

Peter Brooks

moss@cs.umass.edu (Eliot Moss) (07/17/90)

The Godel numbering scheme is fine, but the message says to use p1 for
computing the modulus, and p1, p2, p3 for the Godel number. It should be clear
that the two p1s are different. That is, use some function such as p1^L * p2^W
* p3^H to get a unique single value determined from L, W, and H, and then hash
by taking the modulus using a largest prime less than or equal to your maximum
value. If you prefer not to compute with really large numbers, I note that you
can compute (x * y) mod p via ((x mod p) * (y mod p)) mod p. Note also that
the raising to a power (p^Q) can be done in time proportional to log Q by
computing p^1, p^2, p^4, etc., and accumulating a result that uses the
appropriate subset of the powers calculated. There are many variations on
this.

Finally, rather than a Godel numbering you could use the three dimensional
variation of a two dimension numbering that I illustrate diagrammatically:
     Length

Width	0: 0   1   3   6  10 ...
	1: 2   4   7  11 ...
	2: 5   8  12 ...
	3: 9  13  ...
      ...  ...

Figuring out the formula for this, and then for three dimensions, is a bit of
a pain, but the calculation will be simpler and faster than the Godel number
scheme, I believe. Enjoy!					Eliot Moss
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206; Moss@cs.umass.edu

phmb@otter.hpl.hp.com (Peter Brooks) (07/18/90)

>The Godel numbering scheme is fine, but the message says to use p1 for
>computing the modulus, and p1, p2, p3 for the Godel number. It should be clear
>that the two p1s are different. That is, use some function such as p1^L * p2^W

Yes, my mistake, it should have been p1,p2,p3,p4. Sorry for any confusion -
I know which p I meant :-). The method you propose is indeed easier for
the computer - the reason I particularly like the Godel method is the knowledge
that before you mod anything you do have a unique number so you can be
sure of where your synonyms are coming from. This is, of course, a quite
spurious argument from the point of view of elegance (I do agree that the
actual arithmetic is not that elegant, but it never is). I do sometimes feel
that aesthetics are too often left out in the rush for simplicity. I also
have belief that taking a general approach in the initial stages of solving
a problem leaves you with a vastly more flexible result.

A small justification for the above belief is the frequency with which
large software projects with a cast of thousands produce monsters which
(fortunately) are usually still born or die young. Whilst pieces of hack
code cobbled together to solve a very specific 'one off' problem are
modified and embellished over the years and evolve until their origins are
well obscured - vide UNIX (tm).

Peter Brooks