[comp.archives] [mud] Re: mud data storage and retrieval methods

mjr@hussar.dco.dec.com (Marcus J. Ranum) (05/24/91)

Archive-name: games/mud/untermud/1991-05-23
Archive: decuac.dec.com:/pub/umud.tar.Z [192.5.214.1]
Original-posting-by: mjr@hussar.dco.dec.com (Marcus J. Ranum)
Original-subject: Re: mud data storage and retrieval methods
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)

quasar@bellcore.com writes:
>What database and db techniques are used by popular mud's? At the moment
>I have this toy system which works on ndbm (well, it's free, and I don't
>have to worry about file systems, b-trees, and unpleasant stuff like that). 

	Ignoring the question of "popular", I'll describe how UnterMUD
does caching and database access. :)

	There are a set of "generic" database access functions defined,
namely (in terms of function):

	get object, put object, check-if-object-exists, delete object
	prepare to traverse entire DB, get next object, end DB traverse,
	sync everything.

	The server code is written to use generic cache_get() cache_put()
and so forth, using exactly the same interface, so in theory you could
link the MUD directly against the db-layer and run without a cache.

	Caching is done with a hashed cache in two tiers - the active
and the inactive tiers. It's implemented as a hashed array of linked
lists, pretty much as you'd expect. The two tiers are used to provide
not only a reduction of search time but to provide locking. In other
words, if the cache runs out of room for objects, it WILL NOT drop
something off the active tier, no matter what, since something may
be holding a pointer to that object. Everything in UnterMUD is done
with pointer-passing. If the object's not in the active tier, a search
is made through the larger inactive tier. If it's still not found, the
last object (oldest) in the inactive tier is dropped, and the new
object is loaded from the database, and put at the head of the active
tier. If the inactive tier is utterly drained, the caching layer is 
pretty well swamped, so it just allocates more cache holders. The
net effect of this is that performance won't degrade much even if
rooms get large - the server will reach a steady state size. When
Russ had TinyHELL I under UnterMUD, the server took ~512K of core,
but most of that was probably due to the fact that people walked
around in the very crowded limbo, which stretched the cache. The
cache doesn't shrink back down again, though I suppose it could.

	Whenever a call through the I/O loop (select loop) is finished,
the whole cache is reset, which is done simply by moving the active
tier to the head of the inactive tier, which effects an LRU. This
is MUCH more efficient than a locking scheme like TeenyMUD uses, since
it doesn't have to search the whole cache for locks to clear, nor does
the programmer have to mess with locks. There's a provision for syncing
stuff at intervals.

	Underneath the cache is the actual permanent DB layer, which,
as I mentioned earlier, uses exactly the same call interface as the
cache layer. There are currently 3 different DB layers that can be
plugged in, depending on what you want, each of which has different
advantages or disadvantages. They are as follows:

	1) hashed directories - one object stored per UNIX filename
	2) hashed chunks - objects stored in a large file of logical
		blocks, indexed with a dbm/ndbm database
	3) hashed gdbm file - objects stored in a gdbm file, which
		also handles all the blocking internally
	someday) RMS-based storage for VMS machines

	The actual storage routines are pretty simple, since every
object is indexed based on its object-ID (a string of some length
less than 128char) In the case of hashed directories, the object-ID
is used to generate the pathname, in the case of the dbm version,
it is used as the hash key for a small struct consisting of a size
and offset tuple. In the case of the gdbm version, it's used as the
key, and gdbm takes care of the rest. The original code for doing
this stuff evolved from UberMUD, which used my b+tree code, because
I needed to be able to do linear traversals across attributes (but
that's another story).

	This has proven (IMHO) to be a very clean and effective
architecture, since it's easy to port, and easy to add efficiency
hacks where needed. (IE: if you have Sybase on your machine, write
a db layer that calls Sybase) - because things are brought in only
as needed, and can be flushed out when they're no longer needed
the server tends to reach a steady state as far as resources consumed.

	UnterMUD source is available for anonymous FTP from
decuac.dec.com, in ~ftp/pub - the current version is not "released"
because it is still under change. Official release should be soon,
there are a few little code changes still to make first.

mjr.

-- comp.archives file verification
decuac.dec.com
-rw-r--r--  1 388      system     154363 May  4 21:34 /pub/umud.tar.Z
found untermud ok
decuac.dec.com:/pub/umud.tar.Z