zweig@p.cs.uiuc.edu (07/23/89)
On the subject of <Addr1, Port1, Addr2, Port2> 4-tuples being the way of specifying a connection, does anyone know of a good (in the sense of actually saving time) way of hashing these 12 bytes to 4? I was thinking of describing each connection in the TCP I'm writing by a single (possibly non-unique) 32-bit word that would speed things up when running down the list of active connections when trying to decide what to do with an incoming packet. Simply XOR-ing the source-IP Address, both Ports (concatenated nicely in the datagram already) and ignoring the destination IP-address (since it has to be unique for the TCP module doing the check -- that's guaranteed by my lower-layer) seems like the most straightforward thing to do to minimize comparisons. Plus I was thinking of throwing in a lookaside buffer of the 1 or 2 last touched connection- descriptors.... My question is: do people have experience with this sort of thing, and can I actually get a performance increase on list-checking this way? I am tempted to say "there's typically only a few connections open at any time" and "the lookaside will probably have a very high hit rate since traffic tends to be bursty" but maybe that means that no tricks are required. Certainly just checking 4 or 5 connection-descriptors the "long way" wouldn't be much work. Now, if some idiot were to open 400 or 500 connections, maybe some hasing and caching would pay off... Another alternative would be to store a linked-list through the connection-descriptors that would be reordered with the connection just touched moving to the front, so that the most active connections would be checked first. Costs an extra pointer per descriptor, but seems pretty cheap.... -Johnny Zweig University of Illinois at Urbana-Champaign Department of Computer Science --------------------------------Disclaimer:------------------------------------ Rule 1: Don't believe everything you read. Rule 2: Don't believe anything you read. Rule 3: There is no Rule 3. -------------------------------------------------------------------------------
clynn@BBN.COM (Charles Lynn) (07/24/89)
One trick is to use the <n> high-order bits of the received acknowledgement field as the hashing function, where <n> is selected based on the number of simultaneous connections the implementation is intended to support. Recall that the TCP gets to pick an initial sequence number "based on a clock"; one "clock" might be an <n> bit counter that gets incremented each time a new connection is created (skipping over values which are still "in use"). One has to be a little careful, however, since all received connection attempts will fall into bucket "0" (some might call that a "feature"!) and would have to be moved to the appropriate bucket when the SYN/ACK is sent, and if a connection sends enough data the ACK will move to the "next bucket". E.g., for <n> equal to 8 (a load byte instruction), there would be 256 buckets, and a connection would have to be moved to the next bucket (actually, it is in "two buckets" for a little while) after 16 MBytes of data had been sent.