[comp.protocols.tcp-ip] Ignorant question - multiple connec

zweig@p.cs.uiuc.edu (07/23/89)

   On the subject of <Addr1, Port1, Addr2, Port2> 4-tuples being the
way of specifying a connection, does anyone know of a good (in the sense
of actually saving time) way of hashing these 12 bytes to 4?
   I was thinking of describing each connection in the TCP I'm writing by
a single (possibly non-unique) 32-bit word that would speed things up
when running down the list of active connections when trying to decide
what to do with an incoming packet. Simply XOR-ing the source-IP Address,
both Ports (concatenated nicely in the datagram already) and ignoring the
destination IP-address (since it has to be unique for the TCP module doing
the check -- that's guaranteed by my lower-layer) seems like the most
straightforward thing to do to minimize comparisons. Plus I was thinking
of throwing in a lookaside buffer of the 1 or 2 last touched connection-
descriptors....
   My question is: do people have experience with this sort of thing, and
can I actually get a performance increase on list-checking this way? I
am tempted to say "there's typically only a few connections open at any
time" and "the lookaside will probably have a very high hit rate since
traffic tends to be bursty" but maybe that means that no tricks are
required. Certainly just checking 4 or 5 connection-descriptors the
"long way" wouldn't be much work. Now, if some idiot were to open 400 or
500 connections, maybe some hasing and caching would pay off...
   Another alternative would be to store a linked-list through the
connection-descriptors that would be reordered with the connection just
touched moving to the front, so that the most active connections would
be checked first. Costs an extra pointer per descriptor, but seems pretty
cheap....


-Johnny Zweig
 University of Illinois at Urbana-Champaign
 Department of Computer Science
--------------------------------Disclaimer:------------------------------------
   Rule 1: Don't believe everything you read.
   Rule 2: Don't believe anything you read.
   Rule 3: There is no Rule 3.
-------------------------------------------------------------------------------

clynn@BBN.COM (Charles Lynn) (07/24/89)

One trick is to use the <n> high-order bits of the received acknowledgement
field as the hashing function, where <n> is selected based on the number
of simultaneous connections the implementation is intended to support.
Recall that the TCP gets to pick an initial sequence number "based on a
clock"; one "clock" might be an <n> bit counter that gets incremented
each time a new connection is created (skipping over values which are
still "in use").  One has to be a little careful, however, since all
received connection attempts will fall into bucket "0" (some might call
that a "feature"!) and would have to be moved to the appropriate bucket
when the SYN/ACK is sent, and if a connection sends enough data the
ACK will move to the "next bucket".  E.g., for <n> equal to 8 (a load
byte instruction), there would be 256 buckets, and a connection would
have to be moved to the next bucket (actually, it is in "two buckets"
for a little while) after 16 MBytes of data had been sent.