[net.unix] Semi-conductor Disk for VAX

hgc%ardc@sri-unix.UUCP (01/05/84)

From:      H.G.Corneilson (MISD WAD) <hgc@ardc>

We are looking into getting a "semi-conductor disk" for our VAX 11/780 running
BSD 4.1c UNIX (soon to be 4.2).  We plan to use it for a fast swap area
mostly.  We would appreciate any pointers to such hardware that has been
installed on a VAX running BSD 4.1c Unix.  We would also appreciate any
problems, hints, etc. that has been encountered.  Thanks in advance.
 --------
 Howard G. Corneilson (MISD WAD) <hgc@ardc>

mike%brl-vgr@sri-unix.UUCP (01/05/84)

From:      Mike Muuss <mike@brl-vgr>

Howard -

BRL is presently operating two VAXen with the new (>= 4.1c) filesystem
using a "solid-state disk" device.  In both cases the device is
a DATARAM BULK MOS, with 8 Mbytes of memory.

Discussion:

Using this sort of device for swapping/paging is REDICULOUS.
Your money is much better spent mererly adding more memory to
your VAX.  8 Mbytes is about right for a 780, 6 Mbytes for a 750.
With that quantity of memory, paging is a pretty low-level activity
unless you are running ultra-huge problems.

Using this device for /tmp can be somewhat worthwhile, because
things like ED and CC do lots of SEQUENTIAL I/O on largish files
in /tmp;  hence no amount of physical memory devoted to the UNIX
buffer cache is likely to be able to cache the whole thing.

However, the current cost/performance tradeoffs must be carefully
examined.  My present feeling is that it would be FAR better to
invest another ~$14K in another 9766 style disk, and JUST use
the first few Mbytes for /tmp, RATHER than spending ~25K for
a BULK MOS.  Also, you can hang your 9766 off the SBI or CMI,
whereas the BULK MOS units I am familiar with can usually only
be attached to the UNIBUS (at least on VAXen.  There are 11/70
Cachebus devices availible).

Now, if the machine were a PDP-11 instead of a VAX, then things
would be entirely different, and I would whole-hartedly recommend
getting a BULK MOS unit.  Every 11/70 at BRL has at least 1 such
device, and one has two BULK memory units, and they make a *tremendous*
performance improvement.  But VAXes can take oodles of physical memory.
Why hide it behind an interface bus?

		-Mike

kermit%brl-vgr@sri-unix.UUCP (01/06/84)

From:      Chuck Kennedy <kermit@brl-vgr>

I have to agree with both Mike and Ron concerning their thoughts
about using a semi-conductor disk.  In particular, if you plan on
using the bulk MOS for paging, it is a waste of money.  You are
better off buying more memory for your VAX.  Why go through
extra I/O operations to page to a fast Bulk MOS (on the Unibus)
when you can go directly to memory (on the SBI or CMI)?  Especially
when it is cheaper to buy more VAX memory!!  Of course this argument
can only go on for so long before you have to page (but the new
DEC memory controllers for the 780 can handle up to 16MB and you
can have two controllers for a total of 32MB of real memory).
Again, from experience, 8MB on a 780 seems to be about the right
mix for most jobs.  For paging purposes, it still makes sense to
use a 9766 hooked through the SBI or the CMI because of the higher
bandwidth as well as the cost comparison.

Mike's arguments about using the bulk MOS as /tmp echo my own
exactly.  I bought the bulk MOS for BRL-VGR (a VAX/780) hoping
that it might speed things up (especially considering the dramatic
improvements they made on our 11/70s).  I wish now I could trade
the bulk MOS for a couple of 9766 drives.  Having the extra disk
arms really buys a lot of performance.
					Cheers,
						-Chuck Kennedy
						<kermit @ brl>


P.S.  One of these days VGR will turn into a Purdue style dual-VAX
which should almost double the speed of the machine for a mere $60k.
The only holdup is DEC is being so cussedly SLOWWWWW in delivering
a card cage to mount the backplane (you'd think they could just
rip one of the assembly line).  They only took a year to deliver
the CPU spares kit used to build the dual-VAX.

root@zehntel.UUCP (01/07/84)

#R:sri-arpa:-1511100:zinfandel:21300002:000:710
zinfandel!berry    Jan  6 09:24:00 1984

This may not help the VAX that wants a semiconductor
RAM swapping area, but it may help others with 68000's.
I wrote a device driver that would grab all the ram on
the Multibus (TM) in one of our 68000 machines.  It 
had about 2Mb on a private bus, and left ram for other
devices (tape, ethernet, disk buffers, etc) alone.
We also had a 169Mb Fuji winchester.  Well, I mkfs'd it,
mounted it as /tmp and did some tests.  The Fuji disk
is so <deleted fast> that c compiles were SLOWER with
the RAM disk!!


Berry Kercheval		Zehntel Inc.
(ihnp4!zehntel!zinfandel!berry)
(415)932-6900

(Multibus is a trademark of intel Corp.)
(Winchester is a trademark of the Winchester Rifle Co.,
but that's irrelevant here...)

tjt@kobold.UUCP (01/10/84)

The reason *why* system performance using a good disk is better than
using main memory as a pseudo-disk is that a DMA disk controller can
transfer data at the same time your processor is doing something useful
(i.e. executing user code, or some more directly useful part of the
system code, such as traversing a path name for a file open).  Using
main memory as a psuedo-disk requires that your processor copy data
back and forth.

On the other hand, a semi-conductor disk also has a controller that
will copy data at the same time a program is executing.  Mike Muuss
<mike@brl-vgr> points out that buying more main memory is a better (and
cheaper?) way of improving your swapping/paging performance since it
greatly decreases the amount of paging you do rather than making it
slightly faster.

Mike goes on to point out that using a semiconductor disk may make
sense for temporary files (i.e. /tmp) since "... no amount of physical
memory devoted to the UNIX buffer cache is likely to be able to cache
the whole thing."  While this is true the way memory and I/O buffers
are currently managed in UNIX (i.e. 4BSD), it is not necessarily true.
There is not good reason why the system should not take over all of
physical memory for I/O bufferring if there is no other demand for the
memory.  This is easier in a system such as Multics which already has a
uniform view of memory, but should not be terribly difficult to add to
UNIX.  Basically, all that is required is to use a common data
structure to keep track of what physical memory is used for, rather
than reserving one pool of physical memory for I/O buffers, and another
for programs.  One way of doing this would be to add a page type (e.g.
CIO) to the cmap structure.  The pageout demon could free this page by
calling bwrite.  You may still to limit the maximum number of pages
used by I/O buffers though if you didn't want to deal with dynamically
allocating and freeing struct buf's.
-- 
	Tom Teixeira,  Massachusetts Computer Corporation.  Westford MA
	...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt   (617) 692-6200

rpw3@fortune.UUCP (01/11/84)

#R:sri-arpa:-1511100:fortune:26900017:000:1588
fortune!rpw3    Jan 11 04:01:00 1984

Tom Teixeira notes that disks are faster than memory becasue the
processor can do something else...

Well that is certainly true of PDP-11's and VAXen, but it just ain't
so for many of the modern micro-based systems. The Motorola 68000
in particular can use up as many memory cycles as you can give it
(use the fastest 64k RAMs on the market with the fastest 68k and the
68k will be waiting on the RAMs). Under these conditions, each DMA
memory cycle costs you one CPU memory cycle. In fact, if your bus
happens to be a little slow switching between bus masters (such as a
Multibus), each DMA cycle can cost you several CPU memory cycles.

I have seen systems (not ours) where the CPU would be better off
block moving data in/out to a passive (non-DMA) dual-ported RAM
on the controller card than it would be letting DMA steal (sic)
cycles.

Likewise, v.7 UNIX swaps instead of shuffling memory when it gets
internal fragmentation, under the assumptions that (1) DMA-ing out
to disk and back is a net savings in CPU cycles over block move
(true! for PDP-11), and (2) that there is something else going on
to use those cycles (the other 15 users). These assumptions should
be re-examined on 1-4 user micro-based systems, especially those with
sloooowwww access winchester disks.

Not criticizing any of the previous commentators; just noting again
the Murphy/tanstaafl correlary, "Things aren't always what they seem".

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065

tjt@kobold.UUCP (01/11/84)

Rob Warnock points out that many of the CPU chips (the Motorola 68000
in particular) are memory bound. i.e. most CPU cycles are devoted to
accessing memory, and the CPU cycle time is as fast as the the cycle
time of the fastest 64K rams.

This is certainly an important consideration for some systems.  I guess
the break-even point is where the time lost to switching bus masters is
greater than CPU time required to copy data in a loop, exclusive of the
time used to actually move data.  At the very best, the overhead of
*copying* the data is at least 100% since *two* memory cycles are
required -- one to read the data from the dual-ported RAM and one to
write the data to where you want it.  In addition, there is the
overhead of keeping count of how many bytes (words) to transfer,
incrementing pointers, and looping.  This results in about 50% more
overhead.

On a 68000 you can unroll your loops, or on a 68010 you can make a
two-instruction loop using the DBcc instruction.  Using the DBcc
instruction takes 22 ticks to move 4 bytes of data which would require
8 ticks if you could just write it there in the first place.  You can
do just as well on a 68000 by unrolling you loop by e.g. a factor of
eight.  Each move instruction then takes 20 ticks plus 8 ticks to
update the counter and 10 ticks for the branch, but these 18 ticks of
overhead get distributed over 8 move instructions to give 20 + 18/8 =
22.25 ticks.  Using a DBcc instruction would take even less time, but
would require more time to set up the counter initially.

It should be noted that there are architectural solutions to the
problem of a memory-bound CPU that can be taken at the system level.
In particular, using a cache between the CPU and memory bus is a well
known technique for constructing a memory system whose average speed is
nearly as fast as the fastest semiconductor RAM you care to use but
whose average cost is only slightly more than cheaper (but slower) RAM.
In addition, using a cache will result in fewer memory cycles on the
bus so there is less contention between the CPU and a DMA controller.
It is also possible to increase memory bandwidth available to the cache
by using a larger wordsize on a private memory bus.

Once again, it is necesary to consider an entire system and tailor the
software and peripherals to a particular CPU design since, as Rob notes:

	"Things aren't always what they seem".
-- 
	Tom Teixeira,  Massachusetts Computer Corporation.  Westford MA
	...!{ihnp4,harpo,decvax,ucbcad,tektronix}!masscomp!tjt   (617) 692-6200