tim@uxa.cso.uiuc.edu (/// Bereket Wilbury )) (10/31/90)
I'm a little in the dark as to why such a huge hash file needs to be created for aliases, whereas ''mail'', the primitive force it is, only needs one short ascii file. Could someone enlighten me? The users on our system (uxa.cso.uiuc.edu) get 150K disk space allocations, and a good 10% of that gets used to run elm, most of it being the hash file. +---------------------------------------------------- /~~~~| /~~~~~\ |~~~~~~~/ | Tim Elliott Genghis@uiuc.edu ~| || /~\ |+---+ / | University of Illinois School of Architecture WPGU | || | | | / / | Name of The Week: Bereket Urbana | || \_/ Rocks!/ +------------------------------------------- Champaign |___| \_____/ /__/ -- +---------------------------------------------------- /~~~~| /~~~~~\ |~~~~~~~/ | Tim Elliott Genghis@uiuc.edu ~| || /~\ |+---+ / | University of Illinois School of Architecture WPGU | || | | | / / | Name of The Week: Bereket Urbana | || \_/ Rocks!/ +------------------------------------------- Champaign |___| \_____/ /__/
syd@DSI.COM (Syd Weinstein) (10/31/90)
tim@uxa.cso.uiuc.edu (/// Bereket Wilbury )) writes: >I'm a little in the dark as to why such a huge hash file needs to be created >for aliases, whereas ''mail'', the primitive force it is, only needs one >short ascii file. Because the original author of Elm desired a fast loading and accessing hashed file. Mail just does sequential access. But then the alaiases in mail are not group aliases, nor are they recursive. Does it need to be hashed? Unsure, thats a design issue I never worried about. Could it be read from the text each time and the has built at startup? Yes, and I am unsure how much overhead that would be in terms of startup time. -- ===================================================================== Sydney S. Weinstein, CDP, CCP Elm Coordinator Datacomp Systems, Inc. Voice: (215) 947-9900 syd@DSI.COM or dsinc!syd FAX: (215) 938-0235
chip@chinacat.Unicom.COM (Chip Rosenthal) (10/31/90)
In article <1990Oct30.190327.26105@ux1.cso.uiuc.edu> tim@uxa.cso.uiuc.edu) (/// Bereket Wilbury ) writes: >I'm a little in the dark as to why such a huge hash file needs to be created >for aliases, whereas ''mail'', the primitive force it is, only needs one >short ascii file. I think you are asking two questions here. First, why does Elm hash its aliases? Second, why is this so big? First - speed speed speed. In fact, many MTA's keep their aliases hashed. sendmail does. smail3.1 offers the option to do so. Is the speed worth it? I doubt it is for an MUA (e.g. Elm). Your mailer (the MTA) is getting beaten upon all the time to deliver messages, so hashing aliases is probably a win there. But Elm alias expansion happens infrequently, relatively speaking. Maybe folks who have humongous lists of aliases get a lot of benefit from the hashing. Given the tradeoff of simplicity versus saving 150msec when mailing a message, I'd take simplicity. Second - Elm uses a closed hashing scheme, and the alias file is a fixed size preallocated to hold the maximum number of aliases. Each alias slot requires NLEN+sizeof(long) bytes. On my system that works out to 48+4=52. For 251 user aliases, that means you need 251*52=13052 bytes, even if you have just a single alias. % ls -l .elm/aliases.hash -rw-r--r-- 1 chip unicom 13052 Aug 14 14:12 .elm/aliases.hash That's the price you have to pay given the Elm alias storage scheme. Is it the best way? Possibly not. On the other hand, in the overall cosmic scheme of things, once you load SysV and X11 and Motif and USENET and all the other crap, 12K files get lost in the noise. -- Chip Rosenthal <chip@chinacat.Unicom.COM> Unicom Systems Development, 512-482-8260 Our motto is: We never say, "But it works with DOS."
sandrock@aries.scs.uiuc.edu (Mark T. Sandrock) (11/01/90)
chip@chinacat.Unicom.COM (Chip Rosenthal) writes: >For 251 user aliases, that means you need 251*52=13052 bytes, even if you >have just a single alias. I just installed Elm 2.3 PL8 here under Mips RISC/os, and chose the value of 251 for the max number of aliases without realizing the consequences, i.e., the large hash file. Not that it's a major deal, but it might be of some help to have an idea why 251 was made the default. I use only a few aliases myself, and I wonder how this value of 251 was derived? Mark Sandrock -- BITNET: sandrock@uiucscs Univ. of Illinois at Urbana-Champaign Internet: sandrock@aries.scs.uiuc.edu Chemical Sciences Computing Services Voice: 217-244-0561 505 S. Mathews Ave., Urbana, IL 61801
taylor@limbo.Intuitive.Com (Dave Taylor) (11/02/90)
Mark Sandrock asks: > Not that it's a major deal, but it might be of some help to have an idea > why 251 was made the default. I use only a few aliases myself, and I > wonder how this value of 251 was derived? Most hashing algorithms have better performance (e.g. less collisions) when the hash table size is a prime number. After having decided on a particular hashing scheme for Elm I then ran a number of experiments with different sizes to ascertain whether it held true for the Elm alias scheme too. It did. Weird, eh? :-) Note that the hashing algorithm itself is pretty basic too; the hash key is the sum of the ASCII values of the letters in the key: int hash_it(string, table_size) char *string; int table_size; { /** compute the hash function of the string, returning it (mod table_size) **/ register int i, sum = 0; for (i=0; string[i] != '\0'; i++) sum += (int) string[i]; return(sum % table_size); } (from Elm/src/aliaslib.c) I would encourage people to retrofit a non-hashing algorithm if the disk/memory space is a problem; it should be quite straightforward to do so, especially if the current version still has the external nameserver hooks in it... -- Dave Taylor Intuitive Systems Mountain View, California taylor@limbo.intuitive.com or {uunet!}{decwrl,apple}!limbo!taylor
chip@chinacat.Unicom.COM (Chip Rosenthal) (11/02/90)
In article <sandrock.657397776@aries> sandrock@aries.scs.uiuc.edu (Mark T. Sandrock) writes: >I use only a few aliases myself, and I wonder how this value of 251 >was derived? It's a common rule of thumb to make the hash table size a prime number. The hash function in Elm uses the table size as a divisor, and thus a prime number generally will give a better distribution. Some hashing schemes (not Elm's though) use a rehash function which won't work unless the table size is prime. Another rule of thumb with closed hashing is that you'd like to use no more than 90%-95% of the available slots. Hashing runs pretty linearly until the table starts getting full, at which point it drops very rapidly. So, the number of 251 will give you space for something on the order of 225 aliases before performance starts degrading. If you don't expect to use more than 100 aliases, then by all means feel free to select the 127 value recommended by Configure for small systems. However, keep in mind that this is a system-wide limit, so if one person is going to have hundreds of aliases, everybody is stuck with alias hash tables to accomodate that size. Also bear in mind that lists can really eat up alias slots, since every (non-local) member of the list must be an alias. I don't use Elm aliasing that much, but still I've got 99 entries in my alias table because of a couple of private lists I keep. A somewhat related digression: The advice Configure gives is exactly backwards. It suggests 251 aliases for a default, 127 for a slow machine and 503 for fast machines. Since a bigger alias table is going to work faster, you'd want a big table for slow machines! A big machine is going to have the MIPS to overcome the hash collisions. So, if you have a clunky machine make really big hash tables to speed things up. :-) I think the unstated implication (is that a tautology?) is that a slow machine is going to have less memory resources to throw around, so a small hash table is recommended. Maybe Configure should say small/big instead of slow/fast. -- Chip Rosenthal <chip@chinacat.Unicom.COM> Unicom Systems Development, 512-482-8260 Our motto is: We never say, "But it works with DOS."