[comp.mail.elm] Aliases File

tim@uxa.cso.uiuc.edu (/// Bereket Wilbury )) (10/31/90)

I'm a little in the dark as to why such a huge hash file needs to be created
for aliases, whereas ''mail'', the primitive force it is, only needs one
short ascii file.

Could someone enlighten me?  The users on our system (uxa.cso.uiuc.edu) get
150K disk space allocations, and a good 10% of that gets used to run elm, most
of it being the hash file.


+----------------------------------------------------  /~~~~| /~~~~~\ |~~~~~~~/
|  Tim Elliott     Genghis@uiuc.edu                    ~|   ||  /~\  |+---+  /
|  University of Illinois School of Architecture   WPGU |   ||  | |  |   /  /
|  Name of The Week:   Bereket                   Urbana |   ||  \_/  Rocks!/
+-------------------------------------------  Champaign |___| \_____/  /__/
--
+----------------------------------------------------  /~~~~| /~~~~~\ |~~~~~~~/
|  Tim Elliott     Genghis@uiuc.edu                    ~|   ||  /~\  |+---+  /
|  University of Illinois School of Architecture   WPGU |   ||  | |  |   /  /
|  Name of The Week:   Bereket                   Urbana |   ||  \_/  Rocks!/
+-------------------------------------------  Champaign |___| \_____/  /__/

syd@DSI.COM (Syd Weinstein) (10/31/90)

tim@uxa.cso.uiuc.edu (/// Bereket Wilbury )) writes:

>I'm a little in the dark as to why such a huge hash file needs to be created
>for aliases, whereas ''mail'', the primitive force it is, only needs one
>short ascii file.
Because the original author of Elm desired a fast loading and accessing
hashed file.  Mail just does sequential access.  But then the alaiases
in mail are not group aliases, nor are they recursive.

Does it need to be hashed?  Unsure, thats a design issue I never
worried about.

Could it be read from the text each time and the has built at startup?
Yes, and I am unsure how much overhead that would be in terms of
startup time.
-- 
=====================================================================
Sydney S. Weinstein, CDP, CCP                   Elm Coordinator
Datacomp Systems, Inc.                          Voice: (215) 947-9900
syd@DSI.COM or dsinc!syd                        FAX:   (215) 938-0235

chip@chinacat.Unicom.COM (Chip Rosenthal) (10/31/90)

In article <1990Oct30.190327.26105@ux1.cso.uiuc.edu>
	tim@uxa.cso.uiuc.edu) (/// Bereket Wilbury ) writes:
>I'm a little in the dark as to why such a huge hash file needs to be created
>for aliases, whereas ''mail'', the primitive force it is, only needs one
>short ascii file.

I think you are asking two questions here.  First, why does Elm hash its
aliases?  Second, why is this so big?

First - speed speed speed.  In fact, many MTA's keep their aliases hashed.
sendmail does.  smail3.1 offers the option to do so.  Is the speed worth
it?  I doubt it is for an MUA (e.g. Elm).  Your mailer (the MTA) is getting
beaten upon all the time to deliver messages, so hashing aliases is
probably a win there.  But Elm alias expansion happens infrequently,
relatively speaking.  Maybe folks who have humongous lists of aliases get
a lot of benefit from the hashing.  Given the tradeoff of simplicity
versus saving 150msec when mailing a message, I'd take simplicity.

Second - Elm uses a closed hashing scheme, and the alias file is a fixed
size preallocated to hold the maximum number of aliases.  Each alias slot
requires NLEN+sizeof(long) bytes.  On my system that works out to 48+4=52.
For 251 user aliases, that means you need 251*52=13052 bytes, even if you
have just a single alias.

    % ls -l .elm/aliases.hash
    -rw-r--r--   1 chip     unicom     13052 Aug 14 14:12 .elm/aliases.hash

That's the price you have to pay given the Elm alias storage scheme.  Is
it the best way?  Possibly not.  On the other hand, in the overall cosmic
scheme of things, once you load SysV and X11 and Motif and USENET and all
the other crap, 12K files get lost in the noise.

-- 
Chip Rosenthal  <chip@chinacat.Unicom.COM>
Unicom Systems Development, 512-482-8260 
Our motto is:  We never say, "But it works with DOS."

sandrock@aries.scs.uiuc.edu (Mark T. Sandrock) (11/01/90)

chip@chinacat.Unicom.COM (Chip Rosenthal) writes:

>For 251 user aliases, that means you need 251*52=13052 bytes, even if you
>have just a single alias.

I just installed Elm 2.3 PL8 here under Mips RISC/os, and chose the value
of 251 for the max number of aliases without realizing the consequences,
i.e., the large hash file. Not that it's a major deal, but it might be of
some help to have an idea why 251 was made the default. I use only a few
aliases myself, and I wonder how this value of 251 was derived?

Mark Sandrock

--

BITNET:   sandrock@uiucscs	        Univ. of Illinois at Urbana-Champaign
Internet: sandrock@aries.scs.uiuc.edu   Chemical Sciences Computing Services
Voice:    217-244-0561		        505 S. Mathews Ave., Urbana, IL  61801

taylor@limbo.Intuitive.Com (Dave Taylor) (11/02/90)

Mark Sandrock asks:

> Not that it's a major deal, but it might be of some help to have an idea 
> why 251 was made the default. I use only a few aliases myself, and I 
> wonder how this value of 251 was derived?

Most hashing algorithms have better performance (e.g. less collisions)
when the hash table size is a prime number.  After having decided on a
particular hashing scheme for Elm I then ran a number of experiments 
with different sizes to ascertain whether it held true for the Elm 
alias scheme too.  It did.

Weird, eh?  :-)

Note that the hashing algorithm itself is pretty basic too; the hash
key is the sum of the ASCII values of the letters in the key:

	int
	hash_it(string, table_size)
	char *string;
	int   table_size;
	{
		/** compute the hash function of the string, returning
		    it (mod table_size) **/

		register int i, sum = 0;
		
		for (i=0; string[i] != '\0'; i++)
		  sum += (int) string[i];

		return(sum % table_size);
	}

(from Elm/src/aliaslib.c)

I would encourage people to retrofit a non-hashing algorithm if the
disk/memory space is a problem; it should be quite straightforward
to do so, especially if the current version still has the external
nameserver hooks in it...

						-- Dave Taylor
Intuitive Systems
Mountain View, California

taylor@limbo.intuitive.com    or   {uunet!}{decwrl,apple}!limbo!taylor

chip@chinacat.Unicom.COM (Chip Rosenthal) (11/02/90)

In article <sandrock.657397776@aries>
	sandrock@aries.scs.uiuc.edu (Mark T. Sandrock) writes:
>I use only a few aliases myself, and I wonder how this value of 251
>was derived?

It's a common rule of thumb to make the hash table size a prime number.
The hash function in Elm uses the table size as a divisor, and thus a
prime number generally will give a better distribution.  Some hashing
schemes (not Elm's though) use a rehash function which won't work unless
the table size is prime.

Another rule of thumb with closed hashing is that you'd like to use no
more than 90%-95% of the available slots.  Hashing runs pretty linearly
until the table starts getting full, at which point it drops very rapidly.

So, the number of 251 will give you space for something on the order of
225 aliases before performance starts degrading.  If you don't expect to
use more than 100 aliases, then by all means feel free to select the 127
value recommended by Configure for small systems.

However, keep in mind that this is a system-wide limit, so if one person
is going to have hundreds of aliases, everybody is stuck with alias hash
tables to accomodate that size.  Also bear in mind that lists can really
eat up alias slots, since every (non-local) member of the list must be
an alias.  I don't use Elm aliasing that much, but still I've got 99
entries in my alias table because of a couple of private lists I keep.

A somewhat related digression:  The advice Configure gives is exactly
backwards.  It suggests 251 aliases for a default, 127 for a slow machine
and 503 for fast machines.  Since a bigger alias table is going to work
faster, you'd want a big table for slow machines!  A big machine is going
to have the MIPS to overcome the hash collisions.  So, if you have a
clunky machine make really big hash tables to speed things up. :-)  I
think the unstated implication (is that a tautology?) is that a slow
machine is going to have less memory resources to throw around, so a small
hash table is recommended.  Maybe Configure should say small/big instead
of slow/fast.

-- 
Chip Rosenthal  <chip@chinacat.Unicom.COM>
Unicom Systems Development, 512-482-8260 
Our motto is:  We never say, "But it works with DOS."