[comp.lang.smalltalk] measuring dictionary hash performance

john@utafll.uta.edu (John Baima) (06/26/91)

I am unable to post news, so a friend is posting this for me.

bruce@ling.uta.edu
Bruce Samuelson

There was some discussion a while back of how well dictionaries hash
when their keys are symbols.  Here is an instance method for Dictionary
for ST80 4.0 that measures how well the dictionary is hashed.  Its keys
don't have to be symbols.  Warning: if the dictionary is badly hashed,
this method will be slow.


hashStatistics
	"This method tests how well the receiver is hashed.
	It is adapted from <Dictionary findKeyOrNil:>."


	"Smalltalk hashStatistics"

	"Return an array:

	at: 1	basicSize of dictionary
	at: 2	size of dictionary, i.e., number of elements (associations)
	at: 3	average miss of hash function
		0 means hash is ideal
		N means avg element is placed N steps beyond its hash value
		large number means hash is bad
	at: 4	histogram (using a sorted collection) of misses"

	| basicSize size total histogram |
	basicSize := self basicSize.
	size := self size.
	total := 0.
	histogram := Bag new.
	self
		keysDo:
			[:key |
			| miss location probe |
			miss := 0.
			location := key hash \\ basicSize + 1.
			[(probe := self basicAt: location) isNil or: [probe key ~= key]]
				whileTrue:
					[miss := miss + 1.
					(location := location + 1) > basicSize ifTrue: [location := 1]].
			histogram add: miss.
			total := total + miss].
	^Array
		with: basicSize
		with: size
		with: (total / (size max: 1)) asFloat
		with: histogram sortedElements
-- 
John Baima
john@utafll.uta.edu