[comp.arch] Smart IO controllers - my thoughts

lm@slovax.Eng.Sun.COM (Larry McVoy) (03/21/91)

I've followed the argument about smart controllers with considerable interest.
Since I work in the file system area, I spend a certain amount of time thinking
about increasing performance.  Performance is a funny thing, however, it can
mean different things to different people.  When I think about performance,
I consider the following:

    Throughput: if I ask the system to run one device continuously, how
    	fast is it?  The right answer is "as fast as the disk's platter
	speed".

    Latency: if I ask the system for just one byte, how long before I
	get it? Similarly (and *much* more importantly), if I ask the
	system to write one byte and tell me when it is done, how long
	does that take?

    Capacity: how many disks can the system support?  This is a twofold
	question: (A) how many disks can I physically connect and (B)
	how many disks can I actually keep busy?  (For example, I can
	tie 7 IBM SCSI drives to a 5MB/sec cable, but only 3 (3 x 1.5MB
	= 4.5MB plus some overhead) can be kept busy at a time when
	faced with sequential access).

I'm interested in making all of these better.  I'd like to see a
machine that could talk to hundreds of drives, be able to write small
packets (NFS) very quickly, be able to read from all the drives at
once, and have enough CPU power to do all this without choking.  In
other words, I'm looking at how to build mainframe I/O on Sun class
machines.

When you think about solving this problem, smart I/O is about the only
way to go, for many reasons.  Smart I/O does *not* mean putting the
file system on the controller, at least not the BSD FFS (I have seen a
PC file system, for the QNX OS, living on a controller - worked great
but it was a small, simple FS).  You need to measure your systems and
see where the time goes.  On our systems, a lot goes into the SCSI
driver.  The logical conclusion is to put the driver out on the 
controller and present a simple read/write/ioctl like interface for 
Unix to talk to.  Someone mentioned that interrupts shouldn't cost that
much.  Bzzzt!  Wrong.  Your average brain dead SCSI drive does 8-12
interrupts / transfer.  The interrupt handler is not just incrementing
a counter - it's running the SCSI cable through a state machine (hey,
I didn't invent this).  We're talking ~250 usec/interrupt on a 68020.
Offloading the main CPU is a *good* idea.

Let's look at some of the arguments against smart IO:

"It's too slow, it won't keep up with my newer CPU's".

	This is a good argument, one that has been true in the past, mainly
	due to designs that underestimate the processing power required.
	The way around this is so: note that the CPU side of the card is
	always increasing in speed, but the disk side of the card is fixed
	(SCSI-1 == 5MB/sec and 7 drives).  If the card is designed such that
	it can handle the maximum load that the slower of the two interfaces
	can provide, you win.  It will *never* be too slow as long as you
	continue to use those interfaces.

"It's too expensive".

	This is not a bad argument either.  Smart I/O processors don't
	make a lot of sense for workstations (who's going to shell $6K
	for a controller for a $10K workstation?).  On the other hand,
	look at servers:  you'll typically have 7 drives on one card,
	each drive somewhere around $5k, so we're up to $35K in drives.
	So we're not looking so bad compared to the drive cost, but what
	about the CPU cost.  Couldn't we just add more CPU's to handle
	the extra drives?  I think it is a close call, but usually CPU
	modules are more expensive.  Furthermore, people buy those
	CPU's to do work - not to handle I/O interrupts.

"It'll get out of date, I can't maintain it".

	Another good one.  This is why we apply the KISS principle.  In
	other words, make it talk a simple protocol, like the LANCE 
	protocol.  The Unix device driver for a smart I/O card should,
	for the normal code paths, be very few lines of code.  The ones
	I've written have been "put the control block there, interrupt
	the card, and go away, expecting an interrupt when done".
	Just to show you I'm not stupid, I'd design in an eeprom for the
	on card code...


I'm getting too sleepy to think of any more reasons against, here's a couple
for:

Leverage: build one of these things and make it work for every machine you
	sell.  The disks are really slow in about getting fast, the life of
	a well designed card is probably 3-5 years.

Scaling: if each card has the MIPS to do the I/O, adding more cards adds
	more MIPS.  Your system will scale up gracefully.

Capacity: you'll always have more I/O slots than CPU slots.  If you go the
	MP route, you'll run out of MIPS before you run out of I/O slots.
	This is called an unbalanced machine.

Price: if you did a nice job leveraging your card, you are shipping a lot
	of them, probably across several CPU designs.  That means volume
	and volume means you got the cost of the ASICs down to something
	reasonable.

To sum up: I think many people have been burned by poorly designed,
under powered "smart ass" I/O interfaces.   The fact that bad designs
exist does not preclude the possibility of good designs.  In fact, we
have but to turn to the mainframe people to see how this is done.  I/O
channels have been around for a long time and they are coming to Unix.
---
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

af@cs.cmu.edu (Alessandro Forin) (03/22/91)

[My first attempt at posting this apparently failed, sorry if duplicated]

I share Larry's comments, and since I am not a hardware person I'd like
to perhaps hear the experts on the following, very concrete example.
Larry mentions SCSI, so that's my example. No blessing or flaming
of any particular manifacturer is intended in the following, pls get me
right.

When I wrote my scsi driver for Mach I stomped into the specs for the NCR
53C700 "Script" controller, and I liked it so much, even if it wasn't the
chip I had to fight with, that I modeled my chip-dependent code (for both
the NCR53c94 and the DEC SII 7061) after it.  Both of the latter chips have
the problem Larry mentions of generating too many interrupts, albeit the
53c94 behaves better.  Sounds like a general problem, but not one that
would surmise with the Script chip.

So the question is: how much would it cost to re-design the current
Decstation 5000 SCSI board (it uses the NCR53c94) to use the Script ?

In your thinking I do not believe you can assume much pin-compatibility
between the "old" scsi chip and the "new" scsi chip.
So I guess I am asking
	- how much does it take to design a scsi board from scratch ?
	- how 'bout a redesign ?
	- what is the cost of the parts in it, does "the chip" account
	  for most of it or not ? [This would answer the question of
	  using "expensive CPUs" as I/O processors also]
	- what's the price tag on the 53c700
	- where does a chip like this "loose" ?
	- how bad is the problem of coupling a DMA controller to an
	  off-the-shelf scsi chip ?
	- do hardware engineers like to spend their time working on
	  designing these boards or not ?
	- how could they make the ADAPTEC board affordable to PC owners.

I realize these might not be the right questions/all the questions,
but hey, it's a start.
sandro-