hgc%ardc@sri-unix.UUCP (01/05/84)
From: H.G.Corneilson (MISD WAD) <hgc@ardc> We are looking into getting a "semi-conductor disk" for our VAX 11/780 running BSD 4.1c UNIX (soon to be 4.2). We plan to use it for a fast swap area mostly. We would appreciate any pointers to such hardware that has been installed on a VAX running BSD 4.1c Unix. We would also appreciate any problems, hints, etc. that has been encountered. Thanks in advance. -------- Howard G. Corneilson (MISD WAD) <hgc@ardc>
mike%brl-vgr@sri-unix.UUCP (01/05/84)
From: Mike Muuss <mike@brl-vgr> Howard - BRL is presently operating two VAXen with the new (>= 4.1c) filesystem using a "solid-state disk" device. In both cases the device is a DATARAM BULK MOS, with 8 Mbytes of memory. Discussion: Using this sort of device for swapping/paging is REDICULOUS. Your money is much better spent mererly adding more memory to your VAX. 8 Mbytes is about right for a 780, 6 Mbytes for a 750. With that quantity of memory, paging is a pretty low-level activity unless you are running ultra-huge problems. Using this device for /tmp can be somewhat worthwhile, because things like ED and CC do lots of SEQUENTIAL I/O on largish files in /tmp; hence no amount of physical memory devoted to the UNIX buffer cache is likely to be able to cache the whole thing. However, the current cost/performance tradeoffs must be carefully examined. My present feeling is that it would be FAR better to invest another ~$14K in another 9766 style disk, and JUST use the first few Mbytes for /tmp, RATHER than spending ~25K for a BULK MOS. Also, you can hang your 9766 off the SBI or CMI, whereas the BULK MOS units I am familiar with can usually only be attached to the UNIBUS (at least on VAXen. There are 11/70 Cachebus devices availible). Now, if the machine were a PDP-11 instead of a VAX, then things would be entirely different, and I would whole-hartedly recommend getting a BULK MOS unit. Every 11/70 at BRL has at least 1 such device, and one has two BULK memory units, and they make a *tremendous* performance improvement. But VAXes can take oodles of physical memory. Why hide it behind an interface bus? -Mike
kermit%brl-vgr@sri-unix.UUCP (01/06/84)
From: Chuck Kennedy <kermit@brl-vgr> I have to agree with both Mike and Ron concerning their thoughts about using a semi-conductor disk. In particular, if you plan on using the bulk MOS for paging, it is a waste of money. You are better off buying more memory for your VAX. Why go through extra I/O operations to page to a fast Bulk MOS (on the Unibus) when you can go directly to memory (on the SBI or CMI)? Especially when it is cheaper to buy more VAX memory!! Of course this argument can only go on for so long before you have to page (but the new DEC memory controllers for the 780 can handle up to 16MB and you can have two controllers for a total of 32MB of real memory). Again, from experience, 8MB on a 780 seems to be about the right mix for most jobs. For paging purposes, it still makes sense to use a 9766 hooked through the SBI or the CMI because of the higher bandwidth as well as the cost comparison. Mike's arguments about using the bulk MOS as /tmp echo my own exactly. I bought the bulk MOS for BRL-VGR (a VAX/780) hoping that it might speed things up (especially considering the dramatic improvements they made on our 11/70s). I wish now I could trade the bulk MOS for a couple of 9766 drives. Having the extra disk arms really buys a lot of performance. Cheers, -Chuck Kennedy <kermit @ brl> P.S. One of these days VGR will turn into a Purdue style dual-VAX which should almost double the speed of the machine for a mere $60k. The only holdup is DEC is being so cussedly SLOWWWWW in delivering a card cage to mount the backplane (you'd think they could just rip one of the assembly line). They only took a year to deliver the CPU spares kit used to build the dual-VAX.
root@zehntel.UUCP (01/07/84)
#R:sri-arpa:-1511100:zinfandel:21300002:000:710 zinfandel!berry Jan 6 09:24:00 1984 This may not help the VAX that wants a semiconductor RAM swapping area, but it may help others with 68000's. I wrote a device driver that would grab all the ram on the Multibus (TM) in one of our 68000 machines. It had about 2Mb on a private bus, and left ram for other devices (tape, ethernet, disk buffers, etc) alone. We also had a 169Mb Fuji winchester. Well, I mkfs'd it, mounted it as /tmp and did some tests. The Fuji disk is so <deleted fast> that c compiles were SLOWER with the RAM disk!! Berry Kercheval Zehntel Inc. (ihnp4!zehntel!zinfandel!berry) (415)932-6900 (Multibus is a trademark of intel Corp.) (Winchester is a trademark of the Winchester Rifle Co., but that's irrelevant here...)
tjt@kobold.UUCP (01/10/84)
The reason *why* system performance using a good disk is better than using main memory as a pseudo-disk is that a DMA disk controller can transfer data at the same time your processor is doing something useful (i.e. executing user code, or some more directly useful part of the system code, such as traversing a path name for a file open). Using main memory as a psuedo-disk requires that your processor copy data back and forth. On the other hand, a semi-conductor disk also has a controller that will copy data at the same time a program is executing. Mike Muuss <mike@brl-vgr> points out that buying more main memory is a better (and cheaper?) way of improving your swapping/paging performance since it greatly decreases the amount of paging you do rather than making it slightly faster. Mike goes on to point out that using a semiconductor disk may make sense for temporary files (i.e. /tmp) since "... no amount of physical memory devoted to the UNIX buffer cache is likely to be able to cache the whole thing." While this is true the way memory and I/O buffers are currently managed in UNIX (i.e. 4BSD), it is not necessarily true. There is not good reason why the system should not take over all of physical memory for I/O bufferring if there is no other demand for the memory. This is easier in a system such as Multics which already has a uniform view of memory, but should not be terribly difficult to add to UNIX. Basically, all that is required is to use a common data structure to keep track of what physical memory is used for, rather than reserving one pool of physical memory for I/O buffers, and another for programs. One way of doing this would be to add a page type (e.g. CIO) to the cmap structure. The pageout demon could free this page by calling bwrite. You may still to limit the maximum number of pages used by I/O buffers though if you didn't want to deal with dynamically allocating and freeing struct buf's. -- Tom Teixeira, Massachusetts Computer Corporation. Westford MA ...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt (617) 692-6200
rpw3@fortune.UUCP (01/11/84)
#R:sri-arpa:-1511100:fortune:26900017:000:1588 fortune!rpw3 Jan 11 04:01:00 1984 Tom Teixeira notes that disks are faster than memory becasue the processor can do something else... Well that is certainly true of PDP-11's and VAXen, but it just ain't so for many of the modern micro-based systems. The Motorola 68000 in particular can use up as many memory cycles as you can give it (use the fastest 64k RAMs on the market with the fastest 68k and the 68k will be waiting on the RAMs). Under these conditions, each DMA memory cycle costs you one CPU memory cycle. In fact, if your bus happens to be a little slow switching between bus masters (such as a Multibus), each DMA cycle can cost you several CPU memory cycles. I have seen systems (not ours) where the CPU would be better off block moving data in/out to a passive (non-DMA) dual-ported RAM on the controller card than it would be letting DMA steal (sic) cycles. Likewise, v.7 UNIX swaps instead of shuffling memory when it gets internal fragmentation, under the assumptions that (1) DMA-ing out to disk and back is a net savings in CPU cycles over block move (true! for PDP-11), and (2) that there is something else going on to use those cycles (the other 15 users). These assumptions should be re-examined on 1-4 user micro-based systems, especially those with sloooowwww access winchester disks. Not criticizing any of the previous commentators; just noting again the Murphy/tanstaafl correlary, "Things aren't always what they seem". Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
tjt@kobold.UUCP (01/11/84)
Rob Warnock points out that many of the CPU chips (the Motorola 68000 in particular) are memory bound. i.e. most CPU cycles are devoted to accessing memory, and the CPU cycle time is as fast as the the cycle time of the fastest 64K rams. This is certainly an important consideration for some systems. I guess the break-even point is where the time lost to switching bus masters is greater than CPU time required to copy data in a loop, exclusive of the time used to actually move data. At the very best, the overhead of *copying* the data is at least 100% since *two* memory cycles are required -- one to read the data from the dual-ported RAM and one to write the data to where you want it. In addition, there is the overhead of keeping count of how many bytes (words) to transfer, incrementing pointers, and looping. This results in about 50% more overhead. On a 68000 you can unroll your loops, or on a 68010 you can make a two-instruction loop using the DBcc instruction. Using the DBcc instruction takes 22 ticks to move 4 bytes of data which would require 8 ticks if you could just write it there in the first place. You can do just as well on a 68000 by unrolling you loop by e.g. a factor of eight. Each move instruction then takes 20 ticks plus 8 ticks to update the counter and 10 ticks for the branch, but these 18 ticks of overhead get distributed over 8 move instructions to give 20 + 18/8 = 22.25 ticks. Using a DBcc instruction would take even less time, but would require more time to set up the counter initially. It should be noted that there are architectural solutions to the problem of a memory-bound CPU that can be taken at the system level. In particular, using a cache between the CPU and memory bus is a well known technique for constructing a memory system whose average speed is nearly as fast as the fastest semiconductor RAM you care to use but whose average cost is only slightly more than cheaper (but slower) RAM. In addition, using a cache will result in fewer memory cycles on the bus so there is less contention between the CPU and a DMA controller. It is also possible to increase memory bandwidth available to the cache by using a larger wordsize on a private memory bus. Once again, it is necesary to consider an entire system and tailor the software and peripherals to a particular CPU design since, as Rob notes: "Things aren't always what they seem". -- Tom Teixeira, Massachusetts Computer Corporation. Westford MA ...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt (617) 692-6200