[news.software.b] Enhanced dbz for SunOS 4.0

david@elroy.Jpl.Nasa.Gov (David Robinson) (06/28/89)

In attempts to speed up news processing we installed the public
domain dbz.c DBM replacement.  We noticed a significant speedup
but a runtime analysis found that a large number of system calls
were being performed to lseek() and read() 4 bytes at a time.  As
the history file grew it appeared to get worse.

Below is a SunOS 4.0 specific enhancement to dbz that mmap()'s
the history file and .pag file into main memory and accesses it
directly instead of through lseek() and read().  Our tests have shown
a decrease in system time as much as 50% with no additional page faults.

In order to enable this feature just recompile with a -DMMAP option,
the .pag file is compatible with the original dbz code.  Without the
MMAP option, the only changes that were made were to add a read_only
flag necessary for MMAP but also causes store() to fail earlier, and
lcase() [which was broken] was replaced by strcasecmp() from the
4.3BSD distribution archives on UUNET.

I would be interested in hearing and problems that are encountered, this
code has been running since Feb on a very active Sun-2.

	-David Robinson
	david@elroy.jpl.nasa.gov

P.S. This has not been tested under C news [yet], but it is based on
the same version that is in contrib/dbz of the C news distribution.
========================
/*
dbz.c  V1.5.M

Copyright 1988 Jon Zeeff (umix!b-tech!zeeff)
You can use this code in any manner, as long as you leave my name on it
and don't hold me responsible for any problems with it.

Hacked on by gdb@ninja.UUCP (David Butler); Sun Jun  5 00:27:08 CDT 1988

These routines replace dbm as used by the usenet news software
(it's not a full dbm replacement by any means).  It's fast and
simple.  It contains no AT&T code.

BSD sites will notice some savings in disk space.  Sys V sites without
dbm will notice much faster operation.

Note: .pag files created by version 1.[0-3] need to be recreated before
      using this version.

This code relies on the fact that news stores a pointer to the history
file as the dbm data.  It doesn't store another copy of the key like
dbm does so it saves disk space.  All you can do is fetch() and
store() data.

Just make news with the DBM option and link with dbz.o.
*/

/*
	6/27/89
Dbz has been modified to use the Sun mmap() that was introduced in SunOS 4.0.
The advantages of mmap()ing files into memory instead of using the standard
I/O system is the reduced system time and reduced number of system calls.

Both the raw history file and the .pag file are mapped in, the former is
read-only and the latter read-write.  Because the .pag is written and
thus potentially expanded, it needs to preallocate the amount of space
it needs (via a large hole).  The constant MAX_INDEX below adjusts this.

To implement these features you must #define MMAP.

Ths code has been running on a active full feed site for months with
no reported problems.  Feel free to report any bugs but I cannot
guarentee that the will be fixed or even recognized.
	-David Robinson
	david@elroy.jpl.nasa.gov
*/

/*
   Set this to the something several times larger than the maximum # of
   lines in a history file.  It should be a prime number.
*/
#define INDEX_SIZE 99991L

#define DEBUG if (0) printf	          /* for no debugging */
/* #define DEBUG if (1) printf		  /* for debugging */
/* #define DEBUG if (debug) printf        /* for "rn" type debugging */

/* Note: let the optimizer remove any if(0) or if(1) code above */

#include <fcntl.h>
#include <string.h>
#include <ctype.h>

#ifdef MMAP
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#endif

long lseek();

static long get_ptr();
static int put_ptr();
static void lcase();
static long hash();
static void crcinit();
static strcasecmp();
static strncasecmp();

typedef struct {
	char *dptr;
	int dsize;
} datum;

static int Data_fd;		/* points to /usr/lib/news/[n]history */
static int Index_fd = -1;	/* points to /usr/lib/news/[n]history.pag */
static int read_only;
#ifdef MMAP
	/*
	 * This should be the maximum size expected for a .pag file
	 * with only 4 bytes per history line, 1 Meg should hold ~256K
	 * of articles.
	 */
#define MAX_INDEX (1*1024*1024)
static long max_index;
static long *Index_ptr;
static char *Data_ptr;
static time_t Data_time;
static long max_data;
#endif

int 
dbminit(name)
char *name;
{
        char index_file[1024];	/* index file name */
	void crcinit();
#ifdef MMAP
	struct stat st;
#endif

	if (Index_fd >= 0) {
		DEBUG("dbminit: dbminit aready called once\n");
		return(-1);    /* init already called once */
	}

	strcpy(index_file, name);
	strcat(index_file, ".pag");

	/* if we don't have permission to open the pag file for read/write */
	/* then we may never attempt a store().  If we do, it will fail */

	if ((read_only = 0, Index_fd = open(index_file, O_RDWR)) < 0 &&
	    (read_only = 1, Index_fd = open(index_file, O_RDONLY)) < 0 &&
	    (read_only = 0, Index_fd = open(index_file, O_CREAT | O_RDWR, 0644)) < 0) 
	{
		DEBUG("dbminit: Index_file open failed\n");
		return(-1);
	}

	if ((Data_fd = open(name, O_RDONLY)) < 0) {

		/* The only time the data file does not exist is when */
		/* expire is running (as "news") and it is ok to create it */
		/* If we don't have "permission" the create will fail, too */

		if (close(creat(name, 0644)) < 0 ||
		    (Data_fd = open(name, O_RDONLY)) < 0) {
			DEBUG("dbminit: Data_file open failed\n");
			close(Index_fd);
			Index_fd = -1;
			return(-1);
		}
	}

#ifdef MMAP
	/*
	 * Use SunOS 4.0 mmap() features to map the index file into
	 * memory, mmap() has the nasty feature of causing a SEGV if
	 * try to access data out of range, especially on writes
	 * To combat this, if we open the file for write we extend it
	 * to the maximum value we expect to encounter, this is not
	 * a problem because the extra space is just holes.  We must
	 * make sure that MAX_INDEX is large enough.  If we are read
	 * only we don't worry because we won't write.  All store/fetch
	 * calls are now bounds checked to prevent running out of the
	 * mmap'd space.
	 *
	 * We also map in a read-only copy of the history file, when
	 * we are actually accessing it we do a simple bounds check
	 * and if it fails we re-stat() the history file and remap
	 * it in if it has changed.  This is a small penalty that is not
	 * often seen, normally one does not immediately fetch() an
	 * article just store()'d.
	 *
	 * This mmaping strategy can speed accesses up to as much as
	 * 50% by saving the extra lseeks and read overhead.
	 */
	if (fstat(Index_fd, &st) < 0) {
		DEBUG("dbminit: Cannot stat index file\n");
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
		return(-1);
	}
	max_index = st.st_size;
	if (st.st_size < MAX_INDEX)  {
		DEBUG("dbminit: Extending index file : %d\n", MAX_INDEX);
		max_index = MAX_INDEX;
		if (ftruncate(Index_fd, (off_t)MAX_INDEX) < 0) {
			max_index = st.st_size;
		}
	}
	Index_ptr = (long *)mmap((char *)0, (int)max_index,
				 PROT_READ|(read_only?0:PROT_WRITE),
				 MAP_SHARED, Index_fd, (off_t)0);
	if (Index_ptr == (long *)-1) {
		DEBUG("dbminit: Cannot mmap index file\n");
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
		return(-1);
	}
	/*
	 * Map in a read-only copy of the history file, we maintain a copy
	 * of the files modification time so that if we are about to 
	 * "seek" off the end we can stat the file and remap it if necessary.
	 */
	if (fstat(Data_fd, &st) < 0) {
		DEBUG("dbminit: Cannot stat data file\n");
		(void)munmap((char *)Index_ptr, (int)max_index);
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
		return(-1);
	}
	max_data = st.st_size;
	Data_time = st.st_mtime;
	Data_ptr = (char *)mmap((char *)0, (int)max_data,
				 PROT_READ, MAP_SHARED, Data_fd, (off_t)0);
	if (Data_ptr == (char *)-1) {
		DEBUG("dbminit: Cannot mmap data file\n");
		(void)munmap((char *)Index_ptr, (int)max_index);
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
		return(-1);
	}
#endif
	crcinit();	/* initialize the crc table */
	DEBUG("dbminit: succeeded\n");
	return(0);
}

int
dbmclose()
{
	if (Index_fd >= 0) {
#ifdef MMAP
		(void)munmap((char *)Index_ptr, (int)max_index);
		(void)munmap((char *)Data_ptr, (int)max_data);
#endif
		close(Index_fd);
		Index_fd = -1;
		close(Data_fd);
	}
	DEBUG("dbmclose: succeeded\n");
	return(0);
}

/* get an entry from the database */
datum
fetch(key)
datum key;
{
	long index_ptr;
	long index_size = INDEX_SIZE;
        char buffer[1024];
	static long data_ptr;
	datum output;
	long hash();
	long get_ptr();
	void lcase();

	DEBUG("fetch: (%s)\n", key.dptr);

	for (index_ptr = hash(key.dptr, key.dsize);
	    --index_size && (data_ptr = get_ptr(index_ptr)) >= 0L;
	    index_ptr = ++index_ptr % INDEX_SIZE) {

#ifdef MMAP
		char *data_buffer;
		struct stat st;
		int tmp_dsize = key.dsize;
		/*
		 * If data_ptr is beyond end of history file the current
		 * in-core is probably out of date, we should stat the
		 * file and see if it has changed, if not then this is a
		 * error in the .pag file, other wise it has to be unmapped
		 * and remapped back into core.
		 */
		if ((data_ptr + key.dsize) > max_data) {
			if (fstat(Data_fd, &st) < 0) {
				DEBUG("fetch: Cannot stat data file\n");
				break;
			}
			if ((data_ptr + key.dsize) > st.st_size) {
				/*
				 * This must be a bogus data pointer
				 * the data file is not big enough
				 */
				DEBUG("fetch: Data_ptr beyond end of file %d\n",
					 data_ptr);
				break;
			}
			(void)munmap((char *)Data_ptr, (int)max_data);
			max_data = st.st_size;
			Data_ptr = (char *)mmap((char *)0, (int)max_data,
				 PROT_READ, MAP_SHARED, Data_fd, (off_t)0);
			if (Data_ptr == (char *)-1) {
				DEBUG("fetch: HELP! Cannot remmap data file\n");
				(void)munmap((char *)Index_ptr, (int)max_index);
				close(Index_fd);
				Index_fd = -1;
				close(Data_fd);
				break;
			}
		}
			
		data_buffer = &Data_ptr[data_ptr];
		/* key should be lcase version of article id and no tab */
		if (data_buffer[key.dsize - 1] == '\t') {
			tmp_dsize--;
		}
		DEBUG("fetch: buffer (%.*s)\n", tmp_dsize, data_buffer);
		if (strncasecmp(key.dptr, data_buffer, tmp_dsize) == 0) {
#else
		lseek(Data_fd, data_ptr, 0);
		read(Data_fd, buffer, (unsigned)key.dsize);

		/* key should be lcase version of article id and no tab */
		if (buffer[key.dsize - 1] == '\t') {
			buffer[key.dsize - 1] = '\0';
		}

		DEBUG("fetch: buffer (%s)\n", buffer);
		if (strncasecmp(key.dptr, buffer, key.dsize) == 0) {
#endif
			/* we found it */
			output.dptr = (char *)&data_ptr;
			output.dsize = sizeof(long);
			DEBUG("fetch: successful\n");
			return(output);
		}
	}

	/* we didn't find it */

	output.dptr = (char *)0;
	output.dsize = 0;
	DEBUG("fetch: failed\n");
	return(output);
}

/* add an entry to the database */
store(key, data)
datum key;
datum data;
{
	/* lint complains about a possible pointer alignment problem here */
	/* it is not a problem because dptr is the first element of a */
	/* structure and should be aligned for anything */
	DEBUG("store: (%s, %ld)\n", key.dptr, *((long *)data.dptr));
	if (read_only)
		return(-1);
	return(put_ptr(hash(key.dptr, key.dsize), *((long *)data.dptr)));
}

/* get a data file pointer from the specified location in the index file */

static long
get_ptr(index_ptr)
long index_ptr;
{
	long data_ptr;

	DEBUG("get_ptr: (%ld)\n", index_ptr);

#ifdef MMAP
	if ((index_ptr * sizeof(long)) > (max_index - sizeof(long))) {
		DEBUG("get_ptr: failed index beyond end of file\n");
		return(-1L);
	}
	data_ptr = Index_ptr[index_ptr];
#else
	/* seek to where it should be */
	lseek(Index_fd, (long)(index_ptr * sizeof(long)) ,0);

	/* read it */
	if (read(Index_fd, (char *)&data_ptr, sizeof(long)) != sizeof(long) ||
	    data_ptr == 0L) {
		DEBUG("get_ptr: failed\n");
		return(-1L);
	}
#endif
	--data_ptr;	/* remove the offset we added in put_ptr */
	DEBUG("get_ptr: succeeded %ld\n", data_ptr);
	return(data_ptr);
}

/* put a data file pointer into the specified location in the index file */
/* move down further if slots are full (linear probing) */
static
put_ptr(index_ptr, data_ptr)
long index_ptr;
long data_ptr;
{
	long get_ptr();
	long index_size = INDEX_SIZE;

	/* find an empty slot */
	while (--index_size && get_ptr(index_ptr) >= 0L) {
		index_ptr = ++index_ptr % INDEX_SIZE;
	}

	if (index_size == 0L) {
		DEBUG("put_ptr: hash table overflow - failed\n");
		return(-1);
	}

#ifdef MMAP
	if ((index_ptr * sizeof(long)) > (max_index - sizeof(long))) {
		DEBUG("put_ptr: index pointer beyond end of file\n");
		return(-1);
	}
	Index_ptr[index_ptr] = ++data_ptr;
#else
	/* seek to spot */
	lseek(Index_fd, (long)(index_ptr * sizeof(long)), 0);

	++data_ptr;   /* add one so that we can use 0 as no pointer */

	/* write in data */
	if (write(Index_fd, (char *)&data_ptr, sizeof(long)) != sizeof(long)) {
		DEBUG("put_ptr: write failed\n");
		return(-1);
	}
#endif

	DEBUG("put_ptr: succeeded\n");
	return(0);
}

static void
lcase(s, n)
register char *s;
register int n;
{
	for (; n > 0; --n, ++s) {
		if (isascii(*s) && isupper(*s))
			*s = tolower(*s);
	}
}

/* This is a simplified version of the pathalias hashing function.
 * Thanks to Steve Belovin and Peter Honeyman
 *
 * hash a string into a long int.  31 bit crc (from andrew appel).
 * the crc table is computed at run time by crcinit() -- we could
 * precompute, but it takes 1 clock tick on a 750.
 *
 * This fast table calculation works only if POLY is a prime polynomial
 * in the field of integers modulo 2.  Since the coefficients of a
 * 32-bit polynomial won't fit in a 32-bit word, the high-order bit is
 * implicit.  IT MUST ALSO BE THE CASE that the coefficients of orders
 * 31 down to 25 are zero.  Happily, we have candidates, from
 * E. J.  Watson, "Primitive Polynomials (Mod 2)", Math. Comp. 16 (1962):
 *	x^32 + x^7 + x^5 + x^3 + x^2 + x^1 + x^0
 *	x^31 + x^3 + x^0
 *
 * We reverse the bits to get:
 *	111101010000000000000000000000001 but drop the last 1
 *         f   5   0   0   0   0   0   0
 *	010010000000000000000000000000001 ditto, for 31-bit crc
 *	   4   8   0   0   0   0   0   0
 */

#define POLY 0x48000000L	/* 31-bit polynomial (avoids sign problems) */

static long CrcTable[128];

static void
crcinit()
{	register int i, j;
	register long sum;

	for (i = 0; i < 128; ++i) {
		sum = 0L;
		for (j = 7 - 1; j >= 0; --j)
			if (i & (1 << j))
				sum ^= POLY >> j;
		CrcTable[i] = sum;
	}
	DEBUG("crcinit: done\n");
}

static long
hash(name, size)
register char *name;
register int size;
{
	register long sum = 0L;

	while (size--) {
		sum = (sum >> 7) ^ CrcTable[(sum ^ (*name++)) & 0x7f];
	}
	DEBUG("hash: returns (%ld)\n", sum % INDEX_SIZE);
	return(sum % INDEX_SIZE);
}

/*
 * Copyright (c) 1987 Regents of the University of California.
 * All rights reserved.  The Berkeley software License Agreement
 * specifies the terms and conditions for redistribution.
 */

#if defined(LIBC_SCCS) && !defined(lint)
static char sccsid[] = "@(#)strcasecmp.c	1.3 (Berkeley) 8/3/87";
#endif LIBC_SCCS and not lint

/*
 * This array is designed for mapping upper and lower case letter
 * together for a case independent comparison.  The mappings are
 * based upon ascii character sequences.
 */
static char charmap[] = {
	'\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007',
	'\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
	'\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
	'\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037',
	'\040', '\041', '\042', '\043', '\044', '\045', '\046', '\047',
	'\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057',
	'\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067',
	'\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077',
	'\100', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
	'\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
	'\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
	'\170', '\171', '\172', '\133', '\134', '\135', '\136', '\137',
	'\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
	'\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
	'\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
	'\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177',
	'\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207',
	'\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217',
	'\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227',
	'\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237',
	'\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247',
	'\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257',
	'\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267',
	'\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277',
	'\300', '\341', '\342', '\343', '\344', '\345', '\346', '\347',
	'\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',
	'\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',
	'\370', '\371', '\372', '\333', '\334', '\335', '\336', '\337',
	'\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347',
	'\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',
	'\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',
	'\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377',
};

static
strcasecmp(s1, s2)
	register char *s1, *s2;
{
	register char *cm = charmap;

	while (cm[*s1] == cm[*s2++])
		if (*s1++ == '\0')
			return(0);
	return(cm[*s1] - cm[*--s2]);
}

static
strncasecmp(s1, s2, n)
	register char *s1, *s2;
	register int n;
{
	register char *cm = charmap;

	while (--n >= 0 && cm[*s1] == cm[*s2++])
		if (*s1++ == '\0')
			return(0);
	return(n < 0 ? 0 : cm[*s1] - cm[*--s2]);
}
-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (06/28/89)

In article <17059@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>In attempts to speed up news processing we installed the public
>domain dbz.c DBM replacement.  We noticed a significant speedup
>but a runtime analysis found that a large number of system calls
>were being performed to lseek() and read() 4 bytes at a time.  As
>the history file grew it appeared to get worse.

Dbz used to use fread/fseek which I expect was better.  Someone took it out
and I let it go.  Maybe I should take a look at putting it back in.

>
>Below is a SunOS 4.0 specific enhancement to dbz that mmap()'s
>the history file and .pag file into main memory and accesses it
>directly instead of through lseek() and read().  Our tests have shown
>a decrease in system time as much as 50% with no additional page faults.

Sounds nice.  

>
>lcase() [which was broken] was replaced by strcasecmp() from the
>4.3BSD distribution archives on UUNET.

I believe this is a reference to using tolower() without checking what the
char is.  It was a bsd only bug. 

Mark Moraes did some patches for cnews with dbz.  The most important 
ones are removing lcase() from dbz.c and changes to make mkdbm work.  
They may make it into the next cnews patch.  


-- 
Are you making the world     |  zeeff@b-tech.ann-arbor.mi.us
a better place?              |  Ann Arbor, MI

henry@utzoo.uucp (Henry Spencer) (06/29/89)

In article <17059@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>Below is a SunOS 4.0 specific enhancement to dbz that mmap()'s
>the history file and .pag file into main memory and accesses it
>directly instead of through lseek() and read()...

One of our early C News users has modified dbz to do roughly the same
thing portably, by just keeping everything in memory and writing it out
at the end.  I want to take a hard look at his code before promising
anything, but it may be part of the next patch.  On machines with plenty
of address space and physical memory, apparently it really makes things
fly.  (C expire, in particular, is *very* dbm-bound and should benefit a
lot from this.)

>lcase() [which was broken] was replaced by strcasecmp() from the
>4.3BSD distribution archives on UUNET.

Note that if you're trying this dbz in C News, you want to comment out the
lcase/strcasecmp entirely.  C News doesn't do the case mapping.
-- 
NASA is to spaceflight as the  |     Henry Spencer at U of Toronto Zoology
US government is to freedom.   | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

stevo@judy.Jpl.Nasa.Gov (Steve Groom) (06/30/89)

In article <1989Jun28.180430.2743@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <17059@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>>Below is a SunOS 4.0 specific enhancement to dbz that mmap()'s
>>the history file and .pag file into main memory and accesses it
>>directly instead of through lseek() and read()...
>
>One of our early C News users has modified dbz to do roughly the same
>thing portably, by just keeping everything in memory and writing it out
>at the end.  I want to take a hard look at his code before promising
>anything, but it may be part of the next patch.  On machines with plenty
>of address space and physical memory, apparently it really makes things
>fly.

Keeping the whole file in memory can be better than seeking around the
file and doing selective reads, but you have to have the memory for
it.  Part of the beauty of mmap'ing is that you don't need lots of
physical memory for it to be a win.  One of the reasons David and I got
into this in the first place was that we don't have a lot of physical
memory.  With mmap'ing, you just set up a memory area to correspond to
a file and let the kernel take care of the I/O.  When physical memory runs
short, instead of paging out (= more I/O) parts of an in-core file, the
kernel can just invalidate mapped pages (= no I/O) and re-read them
later.  There is also an I/O speed gain with mmap'ing, which appears to
be the result of the elimination of copying data from kernel buffers
into user space.

Steve Groom, Jet Propulsion Laboratory, Pasadena, CA 91109
Internet: stevo@elroy.jpl.nasa.gov   UUCP: {ames,cit-vax}!elroy!stevo
Disclaimer: (thick German accent) "I know noothingg! Noothingg!"

henry@utzoo.uucp (Henry Spencer) (06/30/89)

In article <17099@elroy.Jpl.Nasa.Gov> stevo@elroy.Jpl.Nasa.Gov (Steve Groom) writes:
>Keeping the whole file in memory can be better than seeking around the
>file and doing selective reads, but you have to have the memory for
>it.  Part of the beauty of mmap'ing is that you don't need lots of
>physical memory for it to be a win...

In principle the same is true of keeping the file in memory, as witness
demand paging.  Either way, only the active parts actually have to be in
memory.  In practice mmap makes it easier for dumb operating systems to
do things right, but it's a lot less portable.

Note that Geoff and I are old fogies who consider one megabyte to be one
@#$%^& of a lot of memory, and are thoroughly disgusted at the inability
of Suns to run decent multi-user timesharing with four whole meg of memory.
(When utzoo was a pdp11/44, it did all right -- not great, but all right --
with 2MB.)  When we say "plenty of memory", we don't mean gigabytes.
-- 
NASA is to spaceflight as the  |     Henry Spencer at U of Toronto Zoology
US government is to freedom.   | uunet!attcan!utzoo!henry henry@zoo.toronto.edu