rcd@nbires.UUCP (Dick Dunn) (08/11/86)
Some recent discussions in unix-wizards and here at work set me to thinking about why one would want a RAM disk as a general sort of feature. ("RAM disk" refers to using a fixed part of main memory as if it were a disk-like device.) First, about the idea of a RAM disk: It's certainly simple to implement; if you can understand the basics of a device driver and you have /dev/mem or something like it, you're 90% of the way there. A RAM disk makes sense for special problems where what you're really doing is making use of main memory for something peculiar to your application. What I DON'T see is why you would want to use a RAM disk (in UNIX) for things like frequently-used files or programs. Consider what you've got in a paging system like 4.2: Frequently-used programs will have their pages dragged into memory where they will stay as long as they are in demand. Frequently-used files will have their most used blocks left in the buffer cache. It seems as if what goes on with page reclamation and the buffer cache is really "RAM disk" behavior with dynamic adaptation to what is most needed, and that this ought to be able to out-perform any static allocation of information to a RAM disk device. There are a couple of holes in this line of thinking. First, the number of buffers is normally fixed at startup; this means that there is no way to make a tradeoff between memory committed to i/o and to paging. Second, the algorithms used to manage pages and buffer cache may not, in fact, retain commonly used information as well as if retention were done explicitly. The latter problem is easily answered: First, the strategies for managing the cache and pages can always be improved once it is known that they have particular deficiencies. Second, there is a balance in that explicit retention may work better at a particular time but it is not adaptive. The former problem suggests that one avenue to explore in performance improvement for UNIX would be to let the buffer cache change size in response to changing need. It just moves the analysis up one level: Instead of LRU on disk blocks and LRU on process-image pages, why not LRU on the whole mess. There's a simplification lurking here--after all, the pages of images represent disk blocks too. Has anyone tried fiddling with UNIX either to make the buffer cache size adaptive or to unify page management and buffer caching? If you made the buffer cache size adaptive but didn't unify the paging and buffer systems, would you get into some sort of interference between the two algorithms (e.g., due to lack or excess of hysteresis)? Consider /tmp as a case where the RAM disk might do particularly well... what (if anything) keeps the buffer cache from performing as well as a RAM disk in this case? If there is some significant difference in performance now, can it be fixed? What generally-useful features does a RAM disk have that I didn't consider? Comments, please. -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...If you get confused just listen to the music play...
judah@whuxcc.UUCP (Judah Greenblatt) (08/12/86)
> Some recent discussions in unix-wizards and here at work set me to thinking > about why one would want a RAM disk as a general sort of feature. ("RAM > disk" refers to using a fixed part of main memory as if it were a disk-like > device.) I was involved in developing ram-disk drivers for several unix systems in an attempt to speed up a large application. During the work we discovered the following useful tidbits: - file systems like /tmp, with large numbers of short-lived files, generate an inordinate amount of physical I/O activity. I'm not sure why, but it seems that the following sequence generates several physical I/O operations, even when the buffer-cache is empty: - create a file - write 1 block of data - close it - open it again - read the data - close it again - unlink the file In the face of thousands of programs which perform this sequence hundreds of times a minute, putting /tmp in ram is a big win. - The free-list is not an efficient cache of program text. In a machine with 20,000 free pages, no more than 2000 of which were ever in use at once, the 'hit ratio' of the free list was measured at under 80%, while the 2000 block buffers had a hit ratio over 95%. - While program text can be reclaimed from the free list, initialized data pages cannot. As the page fault system in 4.2BSD bypasses the block buffers when faulting in initialized data pages, there is NO buffering of such pages, and every page fault generates a physical I/O operation. When running lots of short-lived programs which have a lot of initialized data, it may pay to try to put the entire file system containing your programs into ram! - Note that all of the above is talking about short-lived files and quick-running, tiny programs. Unix already seems to handle the big stuff well enough. (Well, almost well enough. You do keep your large files one per raw disk partition, don't you? :-) - To play these games requires a LOT of memory. Don't even think of allocating less than 1 MB to a ram disk. If you will be putting programs into core, expect to use 10 MB or more. As you will also want 1000 or more block buffers, and probably need to support 100+ users, you're going to need at least 32 MB of ram. - One thought on why you might not want to let the block buffers do all the work: can you imagine what a 'sync' would cost on a system with 20,000 block buffers? -- Judah Greenblatt "Backward into the Future!" Bell Communications Research uucp: {bellcore,infopro,ihnp4}!whuxcc!judah Morristown, New Jersey, USA arpa: whuxcc!judah@mouton.com * Opinions expressed are my own, and not those of the management.
daveb@rtech.UUCP (Dave Brower) (08/12/86)
In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes: > >What generally-useful features does a RAM disk have that I didn't >consider? > >Comments, please. >-- >Dick Dunn Yup, lots of real memory is the way to go. Witness the 128M Convex C-1. Unfortunately, it's kind of steep: I still hear 4k/M quotes on system main memory :-(. It's conceivable to build a BIIG ram disk for cheap, and be nearly vendor indepandant. Imagine a 160M Eagle plug compatible ram disk with 'horribly' slow 450us dynamic RAM used as the paging area, /tmp, and /usr/tmp. That helps you right where some of the biggest bottlenecks are located. -dB -- {amdahl, sun, mtxinu, cbosgd}!rtech!daveb
gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/12/86)
In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes: >What generally-useful features does a RAM disk have that I didn't >consider? The way I look at it, a RAM disk is just one way to exploit cheap semiconductor technology within the existing model of computation. RAMs for such use will normally be slower than main memory RAM, therefore more affordable (min $/bit). Multi-level storage is the rule with large systems; ideally it would be transparent (as in Multics), but in practice "permanent storage" (disk) is different from "temporary storage" (core). The RAM disk on my Apple //e is useful because the Apple main memory management is a botch..
karl@cbrma.UUCP (Karl Kleinpaste) (08/13/86)
judah@whuxcc.UUCP (Judah Greenblatt) writes some very interesting comments on Dick Dunn's remarks on RAM discs. >I was involved in developing ram-disk drivers for several unix systems >in an attempt to speed up a large application. During the work we >discovered the following useful tidbits: >... >- To play these games requires a LOT of memory. Don't even think > of allocating less than 1 MB to a ram disk. If you will be putting > programs into core, expect to use 10 MB or more. As you will also > want 1000 or more block buffers, and probably need to support 100+ > users, you're going to need at least 32 MB of ram. That depends on what the system is and what it's doing. I use PDP-11/73s running SysV with 4Mb and (sigh) 4 RL02-equivalent drives with quite some frequency. The result without RAM discs was pretty positively miserable overall system throughput. For example, compiling a complete kernel from scratch took roughly 45 minutes, and that's with just me on it alone. I added a set of small RAM discs to it, occupying about 1Mb total, 512K of which was /tmp, 256Kb on /usr/tmp, and that complete kernel compilation dropped to barely 30 minutes. For my applications (which involved lots of recompilation of lots of things), the smaller RAM discs are quite useful. -- Karl Kleinpaste
rbl@nitrex.UUCP (08/13/86)
<Line-eater's lunch> About 12 years ago, I developed a solid-state disk marketed by Monolithic Systems Corp. of Englewood CO. (Known as EMU for 'extended memory unit'.) One of my graduate students at Case Western Reserve did a dissertation on UNIX performance enhancements using this disk --- which works much like a RAM disk. Turned out that one of the best strategies was to have TWO disks (since a dual-ported version was available, one could have one unit partitioned and two controllers). One partition was for /usr/tmp and the other was to hold the root. Idea was that frequently used system programs would have greatly reduced latency and that the user's scratch space would be similarly sped up. Note: if a device driver is required, much of the inherent speed advantage is lost. Drivers consume milliseconds per access, not very noticible when disk latency is tens to hundreds of milliseconds. When RAM is accessed, rotational and seek latency go away and the driver delays are very noticable. By the way, the solid-state disk enabled us to do some real-time applications of UNIX very nicely. A 1 million sample/sec analog-to-digital converter was one of these. Robin B. Lake Standard Oil Research and Development 4440 Warrensville Center Road Cleveland, OH 44128 (216)-581-5976 cbosgd!nitrex!rbl cwruecmp!nitrex!rbl
eriks@yetti.UUCP (Eriks Rugelis) (08/14/86)
In article <2979@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes: >>What generally-useful features does a RAM disk have that I didn't >>consider? > >The way I look at it, a RAM disk is just one way to exploit cheap >semiconductor technology within the existing model of computation. >RAMs for such use will normally be slower than main memory RAM, >therefore more affordable (min $/bit). Multi-level storage is the >rule with large systems; ideally it would be transparent (as in >Multics), but in practice "permanent storage" (disk) is different >from "temporary storage" (core). > >The RAM disk on my Apple //e is useful because the Apple main >memory management is a botch.. i recently went to a conference/tradeshow with exactly this premise in mind and came away with a different view of the world... the key here is 'affordability'... it is in some ways an unfortunate reality that there is no market and no real supply of 'old technology' semi-conductor memory parts... just about as soon as they are able, semi-conductor manufacturers ramp up production of the latest and greatest generation of memory devices... hence they don't tend to produce lower density parts for very much longer the bottom line is that semi-conductor disks tend to cost MORE per mega-byte because they are made from the same parts as regular memory boards but must also include a whole bunch of extra logic to support some sort of standard disk interface personality the comments that i heard mostly pointed in the direction of loading up one's cpu with memory and working towards enhancing one's ability to cache data in main storage and thus reduce the need to go to disk you can afford to use a RAM disk on your apple because you don't really need all that much of it... in my case, i don't think that anything less than 70 to 100 mega-bytes would suit my ideal case solution (i run about 10 vms vaxen and some sun's) since vms pages from the file system i would need to put a fair chunk it in my ideal semi-disk configuration does anybody out there have some real experience to comment on? i for one would be glad to hear about it -- Voice: Eriks Rugelis Ma Bell: 416/736-5257 x.2688 Electronic: {allegra|decvax|ihnp4|linus}!utzoo!yetti!eriks.UUCP or eriks@yulibra.BITNET
wcs@ho95e.UUCP (#Bill_Stewart) (08/14/86)
In article <388@rtech.UUCP> daveb@rtech.UUCP (Dave Brower) writes: >In article <514@opus.nbires.UUCP> rcd@nbires.UUCP (Dick Dunn) writes: >>What generally-useful features does a RAM disk have that I didn't consider? > [.............] It's conceivable to build a BIIG ram disk for cheap, >and be nearly vendor independent. In fact, it's been done. One of the companies that sells off-brand memory for VAXen (Dataram?) also sells a RAM disk-emulator for UNIBUS and probably some other busses. Prices were comparable to regular memory, with some overhead for the initial equipment. >Imagine a 160M Eagle plug compatible ram disk with 'horribly' slow 450us >dynamic RAM used as the paging area, /tmp, and /usr/tmp. That helps you >right where some of the biggest bottlenecks are located. What I'd like to see would be a ram disk that overflowed onto magnetic disk - runs from ram if possible, and puts blocks onto magnetic disk in some sort of LRU fashion if necessary. One possible variant on this is to allow file systems to span multiple physical media, and to use a ram disk as one part. This could be valuable in its own right. -- # Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
dan@rsch.wisc.edu (Daniel M. Frank) (08/14/86)
In article <381@nitrex.UUCP> rbl@nitrex.UUCP ( Dr. Robin Lake ) writes: > >Note: if a device driver is required, much of the inherent speed advantage >is lost. Drivers consume milliseconds per access, not very noticible when >disk latency is tens to hundreds of milliseconds. When RAM is accessed, >rotational and seek latency go away and the driver delays are very noticable. Ok, I give up. Did you hack the file system or buffer cache code to access the RAM directly? If not, how did you avoid using device drivers? ------ Dan Frank Q: What's the difference between an Apple MacIntosh and an Etch-A-Sketch? A: You don't have to shake the Mac to clear the screen.
geoff@desint.UUCP (Geoff Kuenning) (08/14/86)
In article <240@whuxcc.UUCP> judah@whuxcc.UUCP (Judah Greenblatt) writes: > I'm not > sure why, but it seems that the following sequence generates several > physical I/O operations, even when the buffer-cache is empty: > - create a file > - write 1 block of data > - close it This is an unfortunate side effect of the file system reliability enhancements that were done for System V (III?). This is the unfortunate reality of reliability--it trades off against performance. In this case, whenever an "important" change is made to an i-node, it is written to disk immediately. I believe this also applies to directories. It has a negative effect on performance in several ways, but most users seem to feel the reliability is worth it. > - One thought on why you might not want to let the block buffers > do all the work: can you imagine what a 'sync' would > cost on a system with 20,000 block buffers? Also, can you imagine what it would be like to crash WITHOUT sync-ing on a system with 20,000 block buffers? Even with the reliabiltiy enhancements? -- Geoff Kuenning {hplabs,ihnp4}!trwrb!desint!geoff
watson@convex.UUCP (08/15/86)
I believe the buffer cache is the way to go. I have made extensive measurements on the Convex system, and our cache hit rate is over 95% in the normal case. Often the cache hit rate is 99% for times in the tens of seconds or more. We dedicate 10% of memory to the buffer cache, so often we run with caches in the range of 8 Mb (although this is user configurable.) We do little physical I/O at all on very large time sharing systems. Having such a large cache, and multiple physical I/O paths, allows us to use larger block file systems (blocksize 64kb, fragsize 8kb), striped across multiple disks, to achieve I/O source sink rates on the order of two megabytes per second or more. Of course the cache is not useful for random reads. Conceptually, I agree that it would be nice if the file system buffer cache and paging system cache were one. You could have one set of kernel I/O routines, instead of two. You could dynamically use pages in the most optimal way, for file buffering, or text buffering. The problem is that you introduce serious coupling between the I/O system and the VM system, which right now are relatively uncoupled. You need to be very careful about locking to avoid deadlocks between the I/O system and VM. For example: you can't do I/O because there aren't any buffer cache pages to be had, but you can't get pages because all the processes in the VM system are locked doing I/O. The other problem is you want to avoid copying the data if possible. Now the Convex C-1 can copy large blocks of data at 20 Mb/s. Nevertheless, we try to avoid the copying if at all possible. I haven't kept any statistics on text hit ratio. One of the biggest problems we are currently facing is with kernel NFS... I think its nice to have stateless servers, but you must effectively disable the buffer cache for writing. The penalty isn't too bad on a Sun class machine, but seems really gross on a C-1 class machine. I am currently investigating this issue. Anyone with ideas on this issue I'd enjoy hearing from. Those of you who want to discuss I/O and performace issues specifically can mail to me at inhp4!convex!watson - I don't normally read net articles. Tom Watson Convex Computers Operating System Developer
wcs@ho95e.UUCP (#Bill_Stewart) (08/18/86)
In article <244@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes: >In article <240@whuxcc.UUCP> judah@whuxcc.UUCP (Judah Greenblatt) writes: >>generates physical I/O operations, even when the buffer-cache is empty: >> - create a file >> - write 1 block of data >> - close it >This is an unfortunate side effect of the file system reliability >enhancements that were done for System V (III?). This is the unfortunate >reality of reliability--it trades off against performance. In this case, >.........., but most users seem >to feel the reliability is worth it. >> - One thought on why you might not want to let the block buffers >> do all the work: can you imagine what a 'sync' would >> cost on a system with 20,000 block buffers? >Also, can you imagine what it would be like to crash WITHOUT sync-ing on a >system with 20,000 block buffers? Is there any way to tell the system "Don't bother syncing /tmp"? On my systems, I let fsck rebuild /tmp anyway; I'd rather trade the reliability of /tmp for speed and only sync for garbage collection. -- # Bill Stewart, AT&T Bell Labs 2G-202, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
mangler@cit-vax.Caltech.Edu (System Mangler) (08/18/86)
In article <514@opus.nbires.UUCP>, rcd@nbires.UUCP (Dick Dunn) writes: > Instead of LRU on disk blocks and LRU on process-image pages, why not LRU > on the whole mess. There's a simplification lurking here--after all, the > pages of images represent disk blocks too. Has anyone tried fiddling with > UNIX either to make the buffer cache size adaptive or to unify page > management and buffer caching? Didn't J. F. Rieser (spelling?) long ago implement a paging Unix that used the buffer cache as a page pool? Or was this just an idea floating around that didn't get implemented?
stuart@BMS-AT.UUCP (Stuart D. Gathman) (08/30/86)
In article <27300012@convex>, watson@convex.UUCP writes: > Conceptually, I agree that it would be nice if the file system buffer > cache and paging system cache were one. You could have one set > of kernel I/O routines, instead of two. You could dynamically > use pages in the most optimal way, for file buffering, or text > buffering. The problem is that you introduce serious coupling The Mach kernel does exactly that! Files can be simply memory mapped and 'read' and 'write' emulated. I am tremendously impressed by the little I read in Unix Review concerning Mach/1 (partly because I had thought of all the stuff except the network transparency and threads years ago and couldn't get anyone interested in doing Unix 'right'). Mach creates a machine dependent layer that handles a) virtual memory b) tasks (like unix processes) c) threads (seperate CPU states within a task for closely coupled parallel processing). d) ports (like SysV message queues with only one receiver. Any size message can be sent. Big messages are handled by twiddling registers in the memory management hardware). e) device drivers The only basic operations are 'send' and 'receive' to ports! The messages (although arbitrary from the kernels point of view) contain headers and are typed so that data formats can be automatically translated when required. The ports in effect behave like objects in smalltalk. Memory is logically copied (for forks, messages, etc.) via "copy on write". The memory is simply mapped into a tasks address space as read only. When a write is attempted, an actual copy is made. 'pagein' and 'pageout' ports can be specified when allocating memory. This is how a file can be memory mapped and why programs don't have to be 'loaded' in the unix sense (which effectively copies the program data from the executable file to the swap area). Ports can be attached to file systems, network servers, SysV emulators, and whatever running in user state. You can page your virtual memory to a filesystem on a remote machine transparently! The network acts as one giant parallel machine. (This is the part I didn't see before.) The only non-transparent effect is that particular code and data will mean different things to different processors. (I.e. 68k code will not execute as desired on a 286). Because of the typing standard for messages, data can be automatically translated by the server for a different CPU. This means setting up a file system on a seperate CPU is trivial! This kernel is also more portable because the machine dependent portion is so small. To port your system (complete with SysV and Berkeley unix environments plus new parallel stuff), you need to change the compilers code generator to handle the CPU, change the kernel to handle the memory management, and change the device drivers to handle the I/O. Unix stuff like pipes, message queues, semaphores, sockets, streams, raw devices, you name it, can be emulated _efficiently_ and portably using 'ports'. These concepts require CPU's with a) large address spaces. (for memory mapped files, not strictly rqd) b) memory management with fault recovery. (for copy on write) 8-bits are out. 80286 would work, but segments complicate efficient code generation. 80386 is in. 68010, 68020 is in. 68000 is out (no page fault recovery). S/1 is out (no page fault recovery). The concepts in this system make the computer network envisioned in 'Tron' a practical reality. -- Stuart D. Gathman <..!seismo!{vrdxhq|dgis}!BMS-AT!stuart>