[comp.lang.perl] need advice - writing a database system

rahard@eeserv.ee.umanitoba.ca (Budi Rahardjo) (05/03/91)

Hi there,
I am in the process of writing a userid database for our computer networks.
We have approx. 17000 users (4 networks and only considering valid userids for
the last 5 years). There are apporx. 30 fields for each record (userid, name,
uid, gid, e-mail, reg.date, expiry date .... etc.)
The objective here is to have unique userids and UIDs. For example
userid will be created if nobody is using it, and the database will
give a new UID. (This brings a policy/question of recycling uid, *sigh*)
Also this database should be accessible only by system administrators of
the 4 networks thru a client program.
I am going to write a database server and client in perl on a networked-Sun.
(actually I have a prototype up and running, but I am not satisfied)

I need suggestions :

1. Would you use a flat file or dbm file for the database ?
2. Would you slurp the whole file ? (considering the number of fields,
   and the size of the database)
3. If I use dbm file (with assoc. array)
   - I'd like to index it with different fields (ie, userid, lastname, uid).
     How do I do this ? (Use different dbm files ?)
   - When I update a record, I want to save it right away.
     Do I have to close the dbm file (dbmclose) and re-open it again ?
     ie. Is there away I can update the dbm file without closing it ?
4. Access to the database by more than one persons at the same time
   is desireable.
   How would you lock the database when you perform critical operations,
   like add a new entry, or updating a record ? (Use perl lock ? or
   just create a file, say  "lock.db", then remove it after you're done ?)
5. Security
   How would you implement security ?
   (Right now I have it setup like pop-mail; after you connect, send an id
   then send a password).
6. Comments ?


Thanks,
-- budi

Note :
- I have tried "jinx", it's too slow, doesn't have a locking facility
  and I don't want the user to be able to change the database structure.
  I wonder if there is an enchanced-jinx ....
--
Budi Rahardjo
<rahardj@ccu.umanitoba.ca>      <rahard@ee.umanitoba.ca> 
Electrical Engineering - University of Manitoba - Canada

tchrist@convex.COM (Tom Christiansen) (05/03/91)

From the keyboard of rahard@eeserv.ee.umanitoba.ca (Budi Rahardjo):
:
:Hi there,
:I am in the process of writing a userid database for our computer networks.
:
:I need suggestions :
:
:1. Would you use a flat file or dbm file for the database ?

Dbm file.  You're talking a lot of records here.

:2. Would you slurp the whole file ? (considering the number of fields,
:   and the size of the database)

No -- too expensive.

:3. If I use dbm file (with assoc. array)
:   - I'd like to index it with different fields (ie, userid, lastname, uid).
:     How do I do this ? (Use different dbm files ?)

I don't think you'd want to duplicate the records, although what you might 
do is to use an indirection scheme where primary lookups are keyed on userid
and for the rest you first map the secondary key to the primary key, as in

    $record = $database{$uid_to_login{$uid}};


:   - When I update a record, I want to save it right away.
:     Do I have to close the dbm file (dbmclose) and re-open it again ?

No.

:     ie. Is there away I can update the dbm file without closing it ?

You can use the reset operator if you're careful.  It is documented 
to flush the database cache.  As of 3.018, its primary effect is
to generate a coredump, but as of 4.003, it appears to work just fine.

:4. Access to the database by more than one persons at the same time
:   is desireable.
:   How would you lock the database when you perform critical operations,
:   like add a new entry, or updating a record ? (Use perl lock ? or
:   just create a file, say  "lock.db", then remove it after you're done ?)

If you want the granularity of the locking to be at the level of the
entire database, this will be easy.  If you want it at a record level, it
will probably not be.  Because of the way dbm works, you can't just lock a
particular region of the file.

You mentioned a networked Sun environment.  If you're using NFS, then you
*must* use the lockdaemon, meaning you'd better use fcntl()s.

If you're not using NFS, and the locks are maintained by one process,
simpler schemes may suffice.

:5. Security
:   How would you implement security ?

Depends on your level of paranoia.  Privileged ports and program may 
be enough if you're trusting of all superusers on your network.  Otherwise
you might want to use secure RPC.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/03/91)

In article <1991May03.143716.827@convex.com> tchrist@convex.COM (Tom Christiansen) writes:
: From the keyboard of rahard@eeserv.ee.umanitoba.ca (Budi Rahardjo):
: :   - When I update a record, I want to save it right away.
: :     Do I have to close the dbm file (dbmclose) and re-open it again ?
: 
: No.
: 
: :     ie. Is there away I can update the dbm file without closing it ?
: 
: You can use the reset operator if you're careful.  It is documented 
: to flush the database cache.  As of 3.018, its primary effect is
: to generate a coredump, but as of 4.003, it appears to work just fine.

It's a write-through cache, so the file is updated when you modify the
variable.  The only purpose of flushing the cache is on the read
side--to resynchronize with changes that someone else might have made
to the file.  It's the moral equivalent of declaring the file
"volatile".

Other than that, I agree with everything Tom said.  I might add that if
you only have a single process modifying the database, you don't have
to worry much about locking or synchronizing, as long as you do things
in an order that won't confuse the readers of the database.

You mentioned that you want to assign new uids--you might take a hint
from kernel pid assignment and only do the dirty work every now and then,
coming up with a batch of free uids to draw from efficiently.  If you
work it right you can even do the work when the machine is otherwise idle.

Larry

roger@mav.com (Roger Droz) (05/07/91)

In article <1991May3.093532.19393@ccu.umanitoba.ca> rahard@eeserv.ee.umanitoba.ca (Budi Rahardjo) writes:
>4. Access to the database by more than one persons at the same time
>   is desireable.
>   How would you lock the database when you perform critical operations,
>   like add a new entry, or updating a record ? (Use perl lock ? or
>   just create a file, say  "lock.db", then remove it after you're done ?)

I have a very old perl ($Header:  perly.c,v 3.0.1.5 90/03/27 16:20:57
Patch level:  18), but I think my experience will apply to all but the
lastest and greatest perl with the special patch for gdbm.

Gdbm permits multiple database readers, but only a single writer may
have the database open.  For that reason, I have always closed perl
dbm associative arrays before descending to human speed to ask for
input.

My old perl seems to open all dbm files for write access, if
permissions allow.  The patch that I haven't yet tried looks like it
passes "open for read-only" to gdbm, but you still have the problem
that gdbm won't allow two writers or a simultaneous writer and reader.
____________
               Roger Droz                  Domain: roger@mav.COM           
()       ()    Maverick International      UUCP: uw-beaver!gtisqr!roger
 (_______)     Mukilteo, WA 
  (     )      
   |   |       Disclaimer: "We're all mavericks here: 
   |   |                    Each of us has our own opinions,
   (___)                    and the company has yet different ones!"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (05/09/91)

In article <1991May3.165954.303@jpl-devvax.jpl.nasa.gov> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
> Other than that, I agree with everything Tom said.

I don't. Every fixed-length field in a user database should be stored in
a flat file indexed by uid. You can trivially keep a linked list of
available uids if allocating a uid has to be fast. Real name, home
directory, and so on can be stored in fixed-length fields with an
exception pointer for the occasional really long entry. This will be
much smaller, faster, and easier to use than a hashed database.

Hashing is pointless when the things being hashed are already numbers in
a small range. For mapping from username back to uid, you probably want
a hash table, but you shouldn't store records that way.

---Dan