smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (01/06/89)
I started wondering about the efficiency of the routing table hash algorithm used in 4.3bsd. So I wandered over to my friendly neighborhood gateway, built a modified netstat -r to dump chain lengths, and went at it. The good news is that the hash was quite even; each of the buckets had about the same number of entries. The bad news is that since the machine was not configured with 'option GATEWAY', there were only 8 such buckets; this meant that the 672 routing entries (looking at the network table only) were divided up just 8 ways... So I ran the regular netstat -r to get a snapshot of the table, built a small awk simulator of the hash, and experimented with different numbers of buckets. The chart below shows the number of searches of each depth, for different numbers of buckets. 8 and 64 are the choices with 4.3bsd, including the latest copy I have of the network updates; clearly, neither is optimal. 8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 64: 0 0 0 1 2 2 7 5 10 8 8 6 2 2 6 3 1 1 512: 195 126 49 17 2 1024: 390 114 18 2048: 552 57 2 Actually, 8 buckets isn't just suboptimal; it's absurd. Here's a different way to look at 8 buckets; this time, the counts are the number of entries hashed to each bucket: 78 84 92 76 99 76 80 87 For 64, the numbers look like this: 8 11 15 12 12 11 10 16 16 12 15 7 14 6 8 10 8 11 18 9 12 7 9 10 16 13 9 9 13 12 9 12 6 7 7 10 11 10 7 9 4 7 10 14 17 15 15 15 9 15 9 8 11 5 11 5 11 8 9 7 9 10 11 10 Better, but still a fair number of searches of length 10 or more. 512 looks decent, but I'd recommend going to 1024 or 2048 buckets. They're relatively cheap; each one only costs 8 bytes. I also tried a few prime numbers; some of them helped a bit, but the cost of the division as opposed to the mask operation probably makes it not worthwhile. For some oddball cases, it might pay to try to smarten up the hash a bit; nothing much is going to help a system with an order of magnitude fewer buckets than entries, though. --Steve Bellovin smb@ulysses.att.com att!ulysses!smb
karn@ka9q.bellcore.com (Phil Karn) (01/07/89)
Another thing I'd suggest to anyone thinking of redoing the routing lookup routines is something I've put in my own code that seems to work pretty well. Each time you find the entry you're looking for in a hash chain, move that entry to the top of the list. When a router sees a given address it is very likely to see that same address again fairly soon, so this constitutes a cheap form of caching that should reduce the average search length. Depending on how expensive the hash function is, it might also be worthwhile to put a very small direct-mapped cache on top of the whole lookup function. This would help greatly when handling rapid bursts of data between the same two points. Phil