[comp.protocols.tcp-ip] routing hash table sizes on 4.3bsd

smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (01/06/89)

I started wondering about the efficiency of the routing table hash
algorithm used in 4.3bsd.  So I wandered over to my friendly neighborhood
gateway, built a modified netstat -r to dump chain lengths, and went at
it.  The good news is that the hash was quite even; each of the buckets
had about the same number of entries.  The bad news is that since the
machine was not configured with 'option GATEWAY', there were only 8
such buckets; this meant that the 672 routing entries (looking at the
network table only) were divided up just 8 ways...  So I ran the regular
netstat -r to get a snapshot of the table, built a small awk simulator
of the hash, and experimented with different numbers of buckets.  The
chart below shows the number of searches of each depth, for different
numbers of buckets.  8 and 64 are the choices with 4.3bsd, including
the latest copy I have of the network updates; clearly, neither is optimal.

	8:	 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1
	64:	 0 0 0 1 2 2 7 5 10 8 8 6 2 2 6 3 1 1
	512:	 195 126 49 17 2
	1024:	 390 114 18
	2048:	 552 57 2

Actually, 8 buckets isn't just suboptimal; it's absurd.  Here's a different
way to look at 8 buckets; this time, the counts are the number of entries
hashed to each bucket:

	78   84   92   76   99   76   80   87

For 64, the numbers look like this:

	    8   11   15   12   12   11   10   16
	   16   12   15    7   14    6    8   10
	    8   11   18    9   12    7    9   10
	   16   13    9    9   13   12    9   12
	    6    7    7   10   11   10    7    9
	    4    7   10   14   17   15   15   15
	    9   15    9    8   11    5   11    5
	   11    8    9    7    9   10   11   10

Better, but still a fair number of searches of length 10 or more.

512 looks decent, but I'd recommend going to 1024 or 2048 buckets.
They're relatively cheap; each one only costs 8 bytes.  I also tried
a few prime numbers; some of them helped a bit, but the cost of the
division as opposed to the mask operation probably makes it not
worthwhile.

For some oddball cases, it might pay to try to smarten up the hash a
bit; nothing much is going to help a system with an order of magnitude
fewer buckets than entries, though.


		--Steve Bellovin
		smb@ulysses.att.com
		att!ulysses!smb

karn@ka9q.bellcore.com (Phil Karn) (01/07/89)

Another thing I'd suggest to anyone thinking of redoing the routing lookup
routines is something I've put in my own code that seems to work pretty well.

Each time you find the entry you're looking for in a hash chain, move that
entry to the top of the list.  When a router sees a given address it is very
likely to see that same address again fairly soon, so this constitutes a
cheap form of caching that should reduce the average search length.

Depending on how expensive the hash function is, it might also be worthwhile
to put a very small direct-mapped cache on top of the whole lookup function.
This would help greatly when handling rapid bursts of data between the same
two points.

Phil