terryn@uop.EDU (Terry Ng) (07/16/90)
I have been trying to figure out a hashing algorithm to combine the measurements of a piece of wood. I have the thickness, width, and length. This is all measured in inches and fractions thereof and converted to its decimal equivalent. I need an algorithm to uniquely combines these almost infinite measurements into a 4 digit code. I say almost infinite because all measurements will not excede 999.99 (separately). I would appreciate any help. I would also welcome suggestions of any kind. I have been playing around with different schemes of my own but have not got anywhere. If it is not possible to obtain a 4-digit code maybe a 5 or more digit code may be used. I don't want to get to the point of just concatinating all of the digits together. That would defeat the purpose of a hash. ******************************************************************************* * * * Email: terryn@uop.edu * * * *******************************************************************************
edp@jareth.enet.dec.com (Eric Postpischil (Always mount a scratch monkey.)) (07/17/90)
> I need an algorithm to uniquely combines these almost > infinite measurements into a 4 digit code. I say almost infinite because > all measurements will not excede 999.99 (separately). No matter what hash scheme you use, there are only 10,000 different 4-digit codes, so 4-digit codes can only encode 10,000 different sets of measurements. If each of your three measurements can have a hundred values, that's a million different combinations, and you cannot fit them into 4 digits. Nor into 5. -- edp
phmb@otter.hpl.hp.com (Peter Brooks) (07/17/90)
You have been advised of the impossibility of your request - quarts do not fit into pint pots, however it is probable that you knew this to begin with and wish to get the most even spread, or to state the same condition you wish to have the mimimum number of synonyms. The best way I know is to use a modified form of Godel numbering ie, produce the Godel number for your bit of wood and take its modulus within the bound that you wish to constrain yourself to. How you then resolve synonym clashes is your affair. In detail: For your bound (4 digits -> 9999) take the largest prime number, p1, in less than this. For each dimension, in your case you have three length width and height, take a prime say p1,p2,p3 in your case. To find your Godel number G = p1^l * p2 ^ w * p3 ^ h. For integral values of l,w and h this number is unique. Now to compress this into your range you simply get your number N = G mod p1. Now for practical measures, you may need to use the final few digits of your length as otherwise the numbers become a little large. Eg. for the range 100, p1 = 97 your bit of wood measures l = 1828.8 mm by w = 101.6 mm by h = 50.8 mm. for p1 = 2, p2 = 3, p3 = 5 we have G = 2^288 * 3 ^ 1016 * 5 ^ 508 = 337750304396705098571122337365999043230482501155685524045168492026139\ 7668939363882432430791175603518883885207439163202131321120617573667213\ 2410980123910106034401038053639481089220571003629510159559729945385280\ 1973536128209216268413114332710640657298341500372170668411203125406938\ 6573824240441039039226095004052247924130770443230816718591439248450820\ 6784631181023079540800465370207294607439644943551360475114866435096689\ 0238356410112741963844888055894318871075292079500904825197270438909441\ 2143891224620408901314362522786931325528467005500144923988970790346871\ 5326004198319743050965807687775094866975550411680728757346514612436294\ 5556640625000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 0000000000000000000000000000000000000000000000000000000000000000000000\ 000000000000000000 G mod 97 = 64 The spread effect is easily seen if we change one measure by 0.1 mm if w = 101.7 mm the answer is 95 and if h is 50.7 it is 71. Peter Brooks
moss@cs.umass.edu (Eliot Moss) (07/17/90)
The Godel numbering scheme is fine, but the message says to use p1 for computing the modulus, and p1, p2, p3 for the Godel number. It should be clear that the two p1s are different. That is, use some function such as p1^L * p2^W * p3^H to get a unique single value determined from L, W, and H, and then hash by taking the modulus using a largest prime less than or equal to your maximum value. If you prefer not to compute with really large numbers, I note that you can compute (x * y) mod p via ((x mod p) * (y mod p)) mod p. Note also that the raising to a power (p^Q) can be done in time proportional to log Q by computing p^1, p^2, p^4, etc., and accumulating a result that uses the appropriate subset of the powers calculated. There are many variations on this. Finally, rather than a Godel numbering you could use the three dimensional variation of a two dimension numbering that I illustrate diagrammatically: Length Width 0: 0 1 3 6 10 ... 1: 2 4 7 11 ... 2: 5 8 12 ... 3: 9 13 ... ... ... Figuring out the formula for this, and then for three dimensions, is a bit of a pain, but the calculation will be simpler and faster than the Godel number scheme, I believe. Enjoy! Eliot Moss -- J. Eliot B. Moss, Assistant Professor Department of Computer and Information Science Lederle Graduate Research Center University of Massachusetts Amherst, MA 01003 (413) 545-4206; Moss@cs.umass.edu
phmb@otter.hpl.hp.com (Peter Brooks) (07/18/90)
>The Godel numbering scheme is fine, but the message says to use p1 for >computing the modulus, and p1, p2, p3 for the Godel number. It should be clear >that the two p1s are different. That is, use some function such as p1^L * p2^W Yes, my mistake, it should have been p1,p2,p3,p4. Sorry for any confusion - I know which p I meant :-). The method you propose is indeed easier for the computer - the reason I particularly like the Godel method is the knowledge that before you mod anything you do have a unique number so you can be sure of where your synonyms are coming from. This is, of course, a quite spurious argument from the point of view of elegance (I do agree that the actual arithmetic is not that elegant, but it never is). I do sometimes feel that aesthetics are too often left out in the rush for simplicity. I also have belief that taking a general approach in the initial stages of solving a problem leaves you with a vastly more flexible result. A small justification for the above belief is the frequency with which large software projects with a cast of thousands produce monsters which (fortunately) are usually still born or die young. Whilst pieces of hack code cobbled together to solve a very specific 'one off' problem are modified and embellished over the years and evolve until their origins are well obscured - vide UNIX (tm). Peter Brooks