[net.unix] sorted /etc/passwd ?

dave@lsuc.UUCP (David Sherman) (01/25/85)

I have just added over 1000 accounts for our students, and we
now have 1526 entries in our /etc/passwd, on a Perkin-Elmer 3220
running v7. Should I consider sorting them and using a binary
search? Does anyone have versions of getpw{nam,ent,uid} and anything
else which would need changing, to implement a binary search?
(We have a source license.)

Dave Sherman
The Law Society of Upper Canada
(416) 947-3466
-- 
{utzoo pesnta nrcaero utcs}!lsuc!dave
{allegra decvax ihnp4 linus}!utcsrgv!lsuc!dave

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/29/85)

Doug Kingston implemented a binary-searched passwd index on
our PDP-11 UNIX systems and a more comprehensive user database
system for 4.2BSD.  Contact dpk@brl-tgr.ARPA

alb@Mitre-Bedford (01/31/85)

Got just such an animal here.  ~800 users were enough for
us to make this change.

The general idea:

Agree that uid's are unique, define a structure with all the
/etc/passwd information on a line in it, read the password file -
writing a structure in a database for each line.

You now have a binary indexed database, indexed by uid.
Since you know the length of the records, lseek is used
to skip to the entry desired. (very fast)

To get the uid from the name, we hashed the names/uid pairs
and put them in a seperate file, but one could also skip
throuth the db comparing name (or any field for that matter),
to the appropriate field.

You can reply directly to me if you have specific questions.

len
-----------------------------------------------------------------------
A. Leonard Brown   alb@mitre-bedford.arpa   len%tufts.csnet@csnet-relay

cudcv@wkdaisy.UUCP (Rob McMahon) (02/03/85)

> Message-ID: <330@lsuc.UUCP>
> I have just added over 1000 accounts for our students, and we
> now have 1526 entries in our /etc/passwd, on a Perkin-Elmer 3220
> running v7. Should I consider sorting them and using a binary
> search? Does anyone have versions of getpw{nam,ent,uid} and anything
> else which would need changing, to implement a binary search?
> (We have a source license.)

At Warwick we have an indexed password file.  The password file itself has
fixed length records, and there are two other files which act as indexes into
this by uid and by name.

This means that the only programs which *have* to be recompiled with the
new library routines are those which modify the password file, or that use
the shell field, since this is now terminated by a colon instead of a newline.

The fixed length records means that programs which modify the passwd file can
simply rewrite the record they are interested in instead of copying the whole
file somewhere safe, making changes on the way, and then copying or moving it
back over the original (the time taken to do this when you have over 2000
entries is no joke).

The getpwuid routine now just indexes into the uid index, gets the password
file address from there, and then seeks into the password file.

The getpwnam routine does a binary search on the name index to get the
password file address.

Each of these routines checks to see the indexes are up to date before
using them, in case the password file has been modified.  passwd, chfn, and
chsh (and one of our own called pwent which changes any field
interactively) just touch the indexes when they are done since they can modify
the password file line in situ.

makeuser inserts the new entry at the appropriate point in the indexes.
deluser just overwrites the password file entry with a 'free' record which is
spotted by the next makeuser, and adjusts the indexes appropriately.

The routines would need slight changes to run on your system, since we have
added some fields to the password file for expiry date, per-process cpu limit
login quota and disk allowance (which is obsolete now we use the 4.2 quotas).

makeuser would probably need major changes, since we have our own standards
for usercodes, login directories etc.  There are also programs to convert from
variable-length- to fixed-length-records, and for recreating the indexes
should they get out of date.

Programs such as ps and ls that create their own in-core password file whilst
reading through for the entry they want, need this code written around to gain
full speed.  Most programs benefit just from recompiling.  The difference
in speed is impessive.  Try an 'ls -l' or 'ps u' or 'f' or even just logging
in without indexes, and with 2000 entries in the passwd file !

Watch out for those programs (like lastcomm) that assume you won't have
uid's greater than 2000 or so and core dump when you do, as well.

If this sounds interesting, mail me, and I'll mail you some source, diffs and
so on.  I think there's too much to go on the net.

        Rob McMahon

                ... mcvax!ukc!ubu!daisy!cudcv

lorne@uokvax.UUCP (02/07/85)

Yes, we do have a hashed passwd file. We have had (and I don't think
this is much to brag about) up to 3000 user accounts on an 11/70. We
also run about 1000 accounts on a VAX 11/780 and use the same hashing
scheme there. This code was originally written back in the V6 days, and
has been modified to run on every version we've had since then (V7,
2.Xbsd, 4.Xbsd).

As an explanation of how this works, we have a file called u-seek,
this is a copy of the password file that allows you to seek into it
using the uid.  Another file called p-seek, is the same as the u-seek
file but for encryted passwords. To complete the set we have two other
files, u-hash and sid-hash that allow you to come up with the uid by
hashing either the login name or the Student ID number.  There is a set
of maintence programs that keep these files up to date. We also have
our own newusr and mkclass programs that create accounts and a rmusr
program that destroys accounts. We have completely re-written all the
getpwent(3) routines so that they use the above files. It also requires
changing or putting mods into every command that parses the passwd
file, a short list of these would include login, passwd, finger, ls.

Good points:
1. Since we regularly run 30-40 people at once on the 11/70 it decreases
login time dramaticly. It used to take up to 5 minutes to log in.
2. It speeds up all the programs that access the passwd file, ls in particular
was horribly slow with a large passwd file.

Bad points:
1. A person can only have one account. Specifically one login name per
uid.
2. There is alot of overhead in keeping all these files up to date. Basicly
it is much more difficult to make random changes to the passwd file.
3. We have never distributed this stuff before. If you wanted it, you
would be the first, and there are some problems associated with that.

We would be glad to help out anyone who wanted a copy of it. Send
mail to me for more information.

					Lorne Wilson
					University of Oklahoma
					...ctvax!uokvax!lorne

guy@rlgvax.UUCP (Guy Harris) (02/07/85)

> At Warwick we have an indexed password file.  The password file itself has
> fixed length records, and there are two other files which act as indexes into
> this by uid and by name.
> 
> This means that the only programs which *have* to be recompiled with the
> new library routines are those which modify the password file, or that use
> the shell field, since this is now terminated by a colon instead of a newline.

Programs which still rummage through the password file themselves, or which
use "getpw", will have to be rewritten (they should have been rewritten
*long* ago!) and recompiled.  (The System V Release 2 "ls" still does this, so
there are probably several other fossils lying around various flavors of
UNIX.)

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy