terryn@uop.EDU (Terry Ng) (07/16/90)
I have been trying to figure out a hashing algorithm to combine the
measurements of a piece of wood. I have the thickness, width, and length.
This is all measured in inches and fractions thereof and converted to its
decimal equivalent. I need an algorithm to uniquely combines these almost
infinite measurements into a 4 digit code. I say almost infinite because
all measurements will not excede 999.99 (separately).
I would appreciate any help. I would also welcome suggestions of any
kind. I have been playing around with different schemes of my own but have
not got anywhere. If it is not possible to obtain a 4-digit code maybe
a 5 or more digit code may be used. I don't want to get to the point of
just concatinating all of the digits together. That would defeat the
purpose of a hash.
*******************************************************************************
* *
* Email: terryn@uop.edu *
* *
*******************************************************************************edp@jareth.enet.dec.com (Eric Postpischil (Always mount a scratch monkey.)) (07/17/90)
> I need an algorithm to uniquely combines these almost > infinite measurements into a 4 digit code. I say almost infinite because > all measurements will not excede 999.99 (separately). No matter what hash scheme you use, there are only 10,000 different 4-digit codes, so 4-digit codes can only encode 10,000 different sets of measurements. If each of your three measurements can have a hundred values, that's a million different combinations, and you cannot fit them into 4 digits. Nor into 5. -- edp
phmb@otter.hpl.hp.com (Peter Brooks) (07/17/90)
You have been advised of the impossibility of your request - quarts do not fit into pint pots, however it is probable that you knew this to begin with and wish to get the most even spread, or to state the same condition you wish to have the mimimum number of synonyms. The best way I know is to use a modified form of Godel numbering ie, produce the Godel number for your bit of wood and take its modulus within the bound that you wish to constrain yourself to. How you then resolve synonym clashes is your affair. In detail: For your bound (4 digits -> 9999) take the largest prime number, p1, in less than this. For each dimension, in your case you have three length width and height, take a prime say p1,p2,p3 in your case. To find your Godel number G = p1^l * p2 ^ w * p3 ^ h. For integral values of l,w and h this number is unique. Now to compress this into your range you simply get your number N = G mod p1. Now for practical measures, you may need to use the final few digits of your length as otherwise the numbers become a little large. Eg. for the range 100, p1 = 97 your bit of wood measures l = 1828.8 mm by w = 101.6 mm by h = 50.8 mm. for p1 = 2, p2 = 3, p3 = 5 we have G = 2^288 * 3 ^ 1016 * 5 ^ 508 = 337750304396705098571122337365999043230482501155685524045168492026139\ 7668939363882432430791175603518883885207439163202131321120617573667213\ 2410980123910106034401038053639481089220571003629510159559729945385280\ 1973536128209216268413114332710640657298341500372170668411203125406938\ 6573824240441039039226095004052247924130770443230816718591439248450820\ 6784631181023079540800465370207294607439644943551360475114866435096689\ 0238356410112741963844888055894318871075292079500904825197270438909441\ 2143891224620408901314362522786931325528467005500144923988970790346871\ 5326004198319743050965807687775094866975550411680728757346514612436294\ 5556640625000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 000000000000000000 G mod 97 = 64 The spread effect is easily seen if we change one measure by 0.1 mm if w = 101.7 mm the answer is 95 and if h is 50.7 it is 71. Peter Brooks
moss@cs.umass.edu (Eliot Moss) (07/17/90)
The Godel numbering scheme is fine, but the message says to use p1 for
computing the modulus, and p1, p2, p3 for the Godel number. It should be clear
that the two p1s are different. That is, use some function such as p1^L * p2^W
* p3^H to get a unique single value determined from L, W, and H, and then hash
by taking the modulus using a largest prime less than or equal to your maximum
value. If you prefer not to compute with really large numbers, I note that you
can compute (x * y) mod p via ((x mod p) * (y mod p)) mod p. Note also that
the raising to a power (p^Q) can be done in time proportional to log Q by
computing p^1, p^2, p^4, etc., and accumulating a result that uses the
appropriate subset of the powers calculated. There are many variations on
this.
Finally, rather than a Godel numbering you could use the three dimensional
variation of a two dimension numbering that I illustrate diagrammatically:
Length
Width 0: 0 1 3 6 10 ...
1: 2 4 7 11 ...
2: 5 8 12 ...
3: 9 13 ...
... ...
Figuring out the formula for this, and then for three dimensions, is a bit of
a pain, but the calculation will be simpler and faster than the Godel number
scheme, I believe. Enjoy! Eliot Moss
--
J. Eliot B. Moss, Assistant Professor
Department of Computer and Information Science
Lederle Graduate Research Center
University of Massachusetts
Amherst, MA 01003
(413) 545-4206; Moss@cs.umass.eduphmb@otter.hpl.hp.com (Peter Brooks) (07/18/90)
>The Godel numbering scheme is fine, but the message says to use p1 for >computing the modulus, and p1, p2, p3 for the Godel number. It should be clear >that the two p1s are different. That is, use some function such as p1^L * p2^W Yes, my mistake, it should have been p1,p2,p3,p4. Sorry for any confusion - I know which p I meant :-). The method you propose is indeed easier for the computer - the reason I particularly like the Godel method is the knowledge that before you mod anything you do have a unique number so you can be sure of where your synonyms are coming from. This is, of course, a quite spurious argument from the point of view of elegance (I do agree that the actual arithmetic is not that elegant, but it never is). I do sometimes feel that aesthetics are too often left out in the rush for simplicity. I also have belief that taking a general approach in the initial stages of solving a problem leaves you with a vastly more flexible result. A small justification for the above belief is the frequency with which large software projects with a cast of thousands produce monsters which (fortunately) are usually still born or die young. Whilst pieces of hack code cobbled together to solve a very specific 'one off' problem are modified and embellished over the years and evolve until their origins are well obscured - vide UNIX (tm). Peter Brooks