[net.unix-wizards] RAM disk vs paging + buffer cache

rcd@nbires.UUCP (Dick Dunn) (08/11/86)

Some recent discussions in unix-wizards and here at work set me to thinking
about why one would want a RAM disk as a general sort of feature.  ("RAM
disk" refers to using a fixed part of main memory as if it were a disk-like
device.)

First, about the idea of a RAM disk:  It's certainly simple to implement;
if you can understand the basics of a device driver and you have /dev/mem
or something like it, you're 90% of the way there.

A RAM disk makes sense for special problems where what you're really doing
is making use of main memory for something peculiar to your application.

What I DON'T see is why you would want to use a RAM disk (in UNIX) for
things like frequently-used files or programs.  Consider what you've got in
a paging system like 4.2:  Frequently-used programs will have their pages
dragged into memory where they will stay as long as they are in demand.
Frequently-used files will have their most used blocks left in the buffer
cache.  It seems as if what goes on with page reclamation and the buffer
cache is really "RAM disk" behavior with dynamic adaptation to what is most
needed, and that this ought to be able to out-perform any static allocation
of information to a RAM disk device.

There are a couple of holes in this line of thinking.  First, the number of
buffers is normally fixed at startup; this means that there is no way to
make a tradeoff between memory committed to i/o and to paging.  Second, the
algorithms used to manage pages and buffer cache may not, in fact, retain
commonly used information as well as if retention were done explicitly.

The latter problem is easily answered:  First, the strategies for managing
the cache and pages can always be improved once it is known that they have
particular deficiencies.  Second, there is a balance in that explicit
retention may work better at a particular time but it is not adaptive.

The former problem suggests that one avenue to explore in performance
improvement for UNIX would be to let the buffer cache change size in
response to changing need.  It just moves the analysis up one level:
Instead of LRU on disk blocks and LRU on process-image pages, why not LRU
on the whole mess.  There's a simplification lurking here--after all, the
pages of images represent disk blocks too.  Has anyone tried fiddling with
UNIX either to make the buffer cache size adaptive or to unify page
management and buffer caching?  If you made the buffer cache size adaptive
but didn't unify the paging and buffer systems, would you get into some
sort of interference between the two algorithms (e.g., due to lack or
excess of hysteresis)?

Consider /tmp as a case where the RAM disk might do particularly well...
what (if anything) keeps the buffer cache from performing as well as a RAM
disk in this case?  If there is some significant difference in performance
now, can it be fixed?

What generally-useful features does a RAM disk have that I didn't
consider?

Comments, please.
-- 
Dick Dunn	{hao,ucbvax,allegra}!nbires!rcd		(303)444-5710 x3086
   ...If you get confused just listen to the music play...

judah@whuxcc.UUCP (Judah Greenblatt) (08/12/86)

> Some recent discussions in unix-wizards and here at work set me to thinking
> about why one would want a RAM disk as a general sort of feature.  ("RAM
> disk" refers to using a fixed part of main memory as if it were a disk-like
> device.)

I was involved in developing ram-disk drivers for several unix systems
in an attempt to speed up a large application.  During the work we
discovered the following useful tidbits:

- file systems like /tmp, with large numbers of short-lived files,
  generate an inordinate amount of physical I/O activity.  I'm not
  sure why, but it seems that the following sequence generates several
  physical I/O operations, even when the buffer-cache is empty:
	- create a file
	- write 1 block of data
	- close it
	- open it again
	- read the data
	- close it again
	- unlink the file
   In the face of thousands of programs which perform this sequence
   hundreds of times a minute, putting /tmp in ram is a big win.

-  The free-list is not an efficient cache of program text.
   In a machine with 20,000 free pages, no more than 2000 of which
   were ever in use at once, the 'hit ratio' of the free list was
   measured at under 80%, while the 2000 block buffers had a hit
   ratio over 95%.

-  While program text can be reclaimed from the free list, initialized
   data pages cannot.  As the page fault system in 4.2BSD bypasses
   the block buffers when faulting in initialized data pages,
   there is NO buffering of such pages, and every page fault generates
   a physical I/O operation.  When running lots of short-lived programs
   which have a lot of initialized data, it may pay to try to put
   the entire file system containing your programs into ram!

-  Note that all of the above is talking about short-lived files and
   quick-running, tiny programs.  Unix already seems to handle the
   big stuff well enough.  (Well, almost well enough.  You do keep
   your large files one per raw disk partition, don't you? :-)

-  To play these games requires a LOT of memory.  Don't even think
   of allocating less than 1 MB to a ram disk.  If you will be putting
   programs into core, expect to use 10 MB or more.  As you will also
   want 1000 or more block buffers, and probably need to support 100+
   users, you're going to need at least 32 MB of ram.

-  One thought on why you might not want to let the block buffers
   do all the work: can you imagine what a 'sync' would
   cost on a system with 20,000 block buffers?
-- 
Judah Greenblatt		"Backward into the Future!"
Bell Communications Research	uucp: {bellcore,infopro,ihnp4}!whuxcc!judah
Morristown, New Jersey, USA	arpa: whuxcc!judah@mouton.com
  * Opinions expressed are my own, and not those of the management.

daveb@rtech.UUCP (Dave Brower) (08/12/86)

In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes:
>
>What generally-useful features does a RAM disk have that I didn't
>consider?
>
>Comments, please.
>--
>Dick Dunn

Yup, lots of real memory is the way to go.  Witness the 128M Convex C-1.
Unfortunately, it's kind of steep:  I still hear 4k/M quotes on system
main memory :-(.  It's conceivable to build a BIIG ram disk for cheap,
and be nearly vendor indepandant.

Imagine a 160M Eagle plug compatible ram disk with 'horribly' slow 450us
dynamic RAM used as the paging area, /tmp, and /usr/tmp.  That helps you
right where some of the biggest bottlenecks are located.

-dB

-- 
{amdahl, sun, mtxinu, cbosgd}!rtech!daveb

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/12/86)

In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes:
>What generally-useful features does a RAM disk have that I didn't
>consider?

The way I look at it, a RAM disk is just one way to exploit cheap
semiconductor technology within the existing model of computation.
RAMs for such use will normally be slower than main memory RAM,
therefore more affordable (min $/bit).  Multi-level storage is the
rule with large systems; ideally it would be transparent (as in
Multics), but in practice "permanent storage" (disk) is different
from "temporary storage" (core).

The RAM disk on my Apple //e is useful because the Apple main
memory management is a botch..

karl@cbrma.UUCP (Karl Kleinpaste) (08/13/86)

judah@whuxcc.UUCP (Judah Greenblatt) writes some very interesting
comments on Dick Dunn's remarks on RAM discs.
>I was involved in developing ram-disk drivers for several unix systems
>in an attempt to speed up a large application.  During the work we
>discovered the following useful tidbits:
>...
>-  To play these games requires a LOT of memory.  Don't even think
>   of allocating less than 1 MB to a ram disk.  If you will be putting
>   programs into core, expect to use 10 MB or more.  As you will also
>   want 1000 or more block buffers, and probably need to support 100+
>   users, you're going to need at least 32 MB of ram.

That depends on what the system is and what it's doing.  I use
PDP-11/73s running SysV with 4Mb and (sigh) 4 RL02-equivalent drives
with quite some frequency.  The result without RAM discs was pretty
positively miserable overall system throughput.  For example,
compiling a complete kernel from scratch took roughly 45 minutes, and
that's with just me on it alone.  I added a set of small RAM discs to
it, occupying about 1Mb total, 512K of which was /tmp, 256Kb on
/usr/tmp, and that complete kernel compilation dropped to barely 30
minutes.  For my applications (which involved lots of recompilation of
lots of things), the smaller RAM discs are quite useful.
-- 
Karl Kleinpaste

rbl@nitrex.UUCP (08/13/86)

<Line-eater's lunch>

About 12 years ago, I developed a solid-state disk marketed by Monolithic
Systems Corp. of Englewood CO. (Known as EMU for 'extended memory unit'.)
One of my graduate students at Case Western Reserve did a dissertation on
UNIX performance enhancements using this disk  ---  which works much like
a RAM disk.  Turned out that one of the best strategies was to have TWO
disks (since a dual-ported version was available, one could have one unit
partitioned and two controllers).  One partition was for /usr/tmp and the
other was to hold the root.  Idea was that frequently used system programs
would have greatly reduced latency and that the user's scratch space would
be similarly sped up.

Note:  if a device driver is required, much of the inherent speed advantage
is lost.  Drivers consume milliseconds per access, not very noticible when
disk latency is tens to hundreds of milliseconds.  When RAM is accessed,
rotational and seek latency go away and the driver delays are very noticable.

By the way, the solid-state disk enabled us to do some real-time applications
of UNIX very nicely.  A 1 million sample/sec analog-to-digital converter was
one of these.

Robin B. Lake
Standard Oil Research and Development
4440 Warrensville Center Road
Cleveland, OH  44128
(216)-581-5976
cbosgd!nitrex!rbl
cwruecmp!nitrex!rbl

eriks@yetti.UUCP (Eriks Rugelis) (08/14/86)

In article <2979@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes:
>>What generally-useful features does a RAM disk have that I didn't
>>consider?
>
>The way I look at it, a RAM disk is just one way to exploit cheap
>semiconductor technology within the existing model of computation.
>RAMs for such use will normally be slower than main memory RAM,
>therefore more affordable (min $/bit).  Multi-level storage is the
>rule with large systems; ideally it would be transparent (as in
>Multics), but in practice "permanent storage" (disk) is different
>from "temporary storage" (core).
>
>The RAM disk on my Apple //e is useful because the Apple main
>memory management is a botch..

i recently went to a conference/tradeshow with exactly this premise in mind
and came away with a different view of the world...  the key here is
'affordability'...  it is in some ways an unfortunate reality that there is
no market and no real supply of 'old technology' semi-conductor memory
parts...  just about as soon as they are able, semi-conductor manufacturers
ramp up production of the latest and greatest generation of memory devices...
hence they don't tend to produce lower density parts for very much longer

the bottom line is that semi-conductor disks tend to cost MORE per mega-byte
because they are made from the same parts as regular memory boards but must
also include a whole bunch of extra logic to support some sort of standard
disk interface personality

the comments that i heard mostly pointed in the direction of loading up one's
cpu with memory and working towards enhancing one's ability to cache data
in main storage and thus reduce the need to go to disk

you can afford to use a RAM disk on your apple because you don't really
need all that much of it... in my case, i don't think that anything less
than 70 to 100 mega-bytes would suit my ideal case solution (i run about
10 vms vaxen and some sun's) since vms pages from the file system i would
need to put a fair chunk it in my ideal semi-disk configuration

does anybody out there have some real experience to comment on?  i for
one would be glad to hear about it

-- 
          Voice: Eriks Rugelis
        Ma Bell: 416/736-5257 x.2688
     Electronic: {allegra|decvax|ihnp4|linus}!utzoo!yetti!eriks.UUCP
              or eriks@yulibra.BITNET

wcs@ho95e.UUCP (#Bill_Stewart) (08/14/86)

In article <388@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes:
>In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes:
>>What generally-useful features does a RAM disk have that I didn't consider?
> [.............]  It's conceivable to build a BIIG ram disk for cheap,
>and be nearly vendor independent.
In fact, it's been done.  One of the companies that sells off-brand
memory for VAXen (Dataram?) also sells a RAM disk-emulator for UNIBUS
and probably some other busses.  Prices were comparable to regular
memory, with some overhead for the initial equipment.
>Imagine a 160M Eagle plug compatible ram disk with 'horribly' slow 450us
>dynamic RAM used as the paging area, /tmp, and /usr/tmp.  That helps you
>right where some of the biggest bottlenecks are located.
What I'd like to see would be a ram disk that overflowed onto magnetic
disk - runs from ram if possible, and puts blocks onto magnetic disk in
some sort of LRU fashion if necessary.  One possible variant on this is
to allow file systems to span multiple physical media, and to use a ram
disk as one part.  This could be valuable in its own right.
-- 
# Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

dan@rsch.wisc.edu (Daniel M. Frank) (08/14/86)

In article <381@nitrex.UUCP> rbl@nitrex.UUCP ( Dr. Robin Lake ) writes:
>
>Note:  if a device driver is required, much of the inherent speed advantage
>is lost.  Drivers consume milliseconds per access, not very noticible when
>disk latency is tens to hundreds of milliseconds.  When RAM is accessed,
>rotational and seek latency go away and the driver delays are very noticable.

   Ok, I give up.  Did you hack the file system or buffer cache code to
access the RAM directly?  If not, how did you avoid using device drivers?


------

      Dan Frank

	  Q: What's the difference between an Apple MacIntosh
	     and an Etch-A-Sketch?

	  A: You don't have to shake the Mac to clear the screen.

geoff@desint.UUCP (Geoff Kuenning) (08/14/86)

In article <240@whuxcc.UUCP> judah@whuxcc.UUCP (Judah Greenblatt) writes:

>   I'm not
>   sure why, but it seems that the following sequence generates several
>   physical I/O operations, even when the buffer-cache is empty:
> 	- create a file
> 	- write 1 block of data
> 	- close it

This is an unfortunate side effect of the file system reliability
enhancements that were done for System V (III?).  This is the unfortunate
reality of reliability--it trades off against performance.  In this case,
whenever an "important" change is made to an i-node, it is written to
disk immediately.  I believe this also applies to directories.  It has
a negative effect on performance in several ways, but most users seem
to feel the reliability is worth it.

> -  One thought on why you might not want to let the block buffers
>    do all the work: can you imagine what a 'sync' would
>    cost on a system with 20,000 block buffers?

Also, can you imagine what it would be like to crash WITHOUT sync-ing on a
system with 20,000 block buffers?  Even with the reliabiltiy enhancements?
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

watson@convex.UUCP (08/15/86)

I believe the buffer cache is the way to go. I have made extensive
measurements on the Convex system, and our cache hit rate is
over 95% in the normal case. Often the cache hit rate is 99% for
times in the tens of seconds or more.

We dedicate 10% of memory to the buffer cache, so often we run
with caches in the range of 8 Mb (although this is user
configurable.) We do little physical I/O at
all on very large time sharing systems. Having such a large cache,
and multiple physical I/O paths, allows us to use larger block
file systems (blocksize 64kb, fragsize 8kb), striped across
multiple disks, to achieve I/O source sink rates on the order of
two megabytes per second or more. Of course the cache is not
useful for random reads. 

Conceptually, I agree that it would be nice if the file system buffer
cache and paging system cache were one. You could have one set
of kernel I/O routines, instead of two. You could dynamically
use pages in the most optimal way, for file buffering, or text
buffering. The problem is that you introduce serious coupling
between the I/O system and the VM system, which right now
are relatively uncoupled. You need to be very careful about
locking to avoid deadlocks between the I/O system and VM.
For example: you can't do I/O because there aren't any buffer
cache pages to be had, but you can't get pages because all
the processes in the VM system are locked doing I/O.
The other problem is you want to avoid copying the data if
possible. Now the Convex C-1 can copy large blocks of data
at 20 Mb/s. Nevertheless, we try to avoid the copying if at
all possible.

I haven't kept any statistics on text hit ratio.

One of the biggest problems we are currently facing is with
kernel NFS... I think its nice to have stateless servers,
but you must effectively disable the buffer cache for writing.
The penalty isn't too bad on a Sun class machine, but seems
really gross on a C-1 class machine. I am currently investigating
this issue. Anyone with ideas on this issue I'd enjoy hearing
from.

Those of you who want to discuss I/O and performace issues specifically
can mail to me at inhp4!convex!watson - I don't normally read
net articles.

Tom Watson
Convex Computers
Operating System Developer

wcs@ho95e.UUCP (#Bill_Stewart) (08/18/86)

In article <244@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
>In article <240@whuxcc.UUCP> judah@whuxcc.UUCP (Judah Greenblatt) writes:
>>generates physical I/O operations, even when the buffer-cache is empty:
>> 	- create a file >> 	- write 1 block of data >> 	- close it
>This is an unfortunate side effect of the file system reliability
>enhancements that were done for System V (III?).  This is the unfortunate
>reality of reliability--it trades off against performance.  In this case,
>.........., but most users seem >to feel the reliability is worth it.
>> -  One thought on why you might not want to let the block buffers
>>    do all the work: can you imagine what a 'sync' would
>>    cost on a system with 20,000 block buffers?
>Also, can you imagine what it would be like to crash WITHOUT sync-ing on a
>system with 20,000 block buffers?  

Is there any way to tell the system "Don't bother syncing /tmp"?  On
my systems, I let fsck rebuild /tmp anyway; I'd rather trade the
reliability of /tmp for speed and only sync for garbage collection.
-- 
# Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

mangler@cit-vax.Caltech.Edu (System Mangler) (08/18/86)

In article <514@opus.nbires.UUCP>, rcd@nbires.UUCP (Dick Dunn) writes:
> Instead of LRU on disk blocks and LRU on process-image pages, why not LRU
> on the whole mess.  There's a simplification lurking here--after all, the
> pages of images represent disk blocks too.  Has anyone tried fiddling with
> UNIX either to make the buffer cache size adaptive or to unify page
> management and buffer caching?

Didn't J. F. Rieser (spelling?) long ago implement a paging Unix
that used the buffer cache as a page pool?  Or was this just an
idea floating around that didn't get implemented?

stuart@BMS-AT.UUCP (Stuart D. Gathman) (08/30/86)

In article <27300012@convex>, watson@convex.UUCP writes:

> Conceptually, I agree that it would be nice if the file system buffer
> cache and paging system cache were one. You could have one set
> of kernel I/O routines, instead of two. You could dynamically
> use pages in the most optimal way, for file buffering, or text
> buffering. The problem is that you introduce serious coupling

The Mach kernel does exactly that!  Files can be simply memory mapped
and 'read' and 'write' emulated.  I am tremendously impressed by the little
I read in Unix Review concerning Mach/1 (partly because I had thought of
all the stuff except the network transparency and threads years ago
and couldn't get anyone interested in doing Unix 'right').  
Mach creates a machine dependent layer that handles 

	a) virtual memory
	b) tasks (like unix processes)
	c) threads (seperate CPU states within a task for closely
	   coupled parallel processing).
	d) ports (like SysV message queues with only one receiver.  Any
	   size message can be sent.  Big messages are handled by 
	   twiddling registers in the memory management hardware).
	e) device drivers

The only basic operations are 'send' and 'receive' to ports!  The messages
(although arbitrary from the kernels point of view) contain headers and
are typed so that data formats can be automatically translated when
required.  The ports in effect behave like objects in smalltalk.

Memory is logically copied (for forks, messages, etc.) via "copy on write".
The memory is simply mapped into a tasks address space as read only.
When a write is attempted, an actual copy is made.

'pagein' and 'pageout' ports can be specified when allocating memory.
This is how a file can be memory mapped and why programs don't have to
be 'loaded' in the unix sense (which effectively copies the program data
from the executable file to the swap area).

Ports can be attached to file systems, network servers, SysV emulators,
and whatever running in user state.  You can page your virtual memory
to a filesystem on a remote machine transparently!  The network acts
as one giant parallel machine.  (This is the part I didn't see before.)
The only non-transparent effect is that particular code and data will mean
different things to different processors.  (I.e. 68k code will not execute
as desired on a 286).  Because of the typing standard for messages, data
can be automatically translated by the server for a different CPU.  This
means setting up a file system on a seperate CPU is trivial!

This kernel is also more portable because the machine dependent portion
is so small.  To port your system (complete with SysV and Berkeley 
unix environments plus new parallel stuff), you need to change the compilers
code generator to handle the CPU, change the kernel to handle the memory
management, and change the device drivers to handle the I/O.  Unix stuff
like pipes, message queues, semaphores, sockets, streams, raw devices,
you name it, can be emulated _efficiently_ and portably using 'ports'.

These concepts require CPU's with

	a) large address spaces. (for memory mapped files, not strictly rqd)
	b) memory management with fault recovery. (for copy on write)

8-bits are out.
80286 would work, but segments complicate efficient code generation.
80386 is in.
68010, 68020 is in.
68000 is out (no page fault recovery).
S/1 is out (no page fault recovery).

The concepts in this system make the computer network envisioned in
'Tron' a practical reality.
-- 
Stuart D. Gathman	<..!seismo!{vrdxhq|dgis}!BMS-AT!stuart>