lm@slovax.Eng.Sun.COM (Larry McVoy) (03/21/91)
I've followed the argument about smart controllers with considerable interest. Since I work in the file system area, I spend a certain amount of time thinking about increasing performance. Performance is a funny thing, however, it can mean different things to different people. When I think about performance, I consider the following: Throughput: if I ask the system to run one device continuously, how fast is it? The right answer is "as fast as the disk's platter speed". Latency: if I ask the system for just one byte, how long before I get it? Similarly (and *much* more importantly), if I ask the system to write one byte and tell me when it is done, how long does that take? Capacity: how many disks can the system support? This is a twofold question: (A) how many disks can I physically connect and (B) how many disks can I actually keep busy? (For example, I can tie 7 IBM SCSI drives to a 5MB/sec cable, but only 3 (3 x 1.5MB = 4.5MB plus some overhead) can be kept busy at a time when faced with sequential access). I'm interested in making all of these better. I'd like to see a machine that could talk to hundreds of drives, be able to write small packets (NFS) very quickly, be able to read from all the drives at once, and have enough CPU power to do all this without choking. In other words, I'm looking at how to build mainframe I/O on Sun class machines. When you think about solving this problem, smart I/O is about the only way to go, for many reasons. Smart I/O does *not* mean putting the file system on the controller, at least not the BSD FFS (I have seen a PC file system, for the QNX OS, living on a controller - worked great but it was a small, simple FS). You need to measure your systems and see where the time goes. On our systems, a lot goes into the SCSI driver. The logical conclusion is to put the driver out on the controller and present a simple read/write/ioctl like interface for Unix to talk to. Someone mentioned that interrupts shouldn't cost that much. Bzzzt! Wrong. Your average brain dead SCSI drive does 8-12 interrupts / transfer. The interrupt handler is not just incrementing a counter - it's running the SCSI cable through a state machine (hey, I didn't invent this). We're talking ~250 usec/interrupt on a 68020. Offloading the main CPU is a *good* idea. Let's look at some of the arguments against smart IO: "It's too slow, it won't keep up with my newer CPU's". This is a good argument, one that has been true in the past, mainly due to designs that underestimate the processing power required. The way around this is so: note that the CPU side of the card is always increasing in speed, but the disk side of the card is fixed (SCSI-1 == 5MB/sec and 7 drives). If the card is designed such that it can handle the maximum load that the slower of the two interfaces can provide, you win. It will *never* be too slow as long as you continue to use those interfaces. "It's too expensive". This is not a bad argument either. Smart I/O processors don't make a lot of sense for workstations (who's going to shell $6K for a controller for a $10K workstation?). On the other hand, look at servers: you'll typically have 7 drives on one card, each drive somewhere around $5k, so we're up to $35K in drives. So we're not looking so bad compared to the drive cost, but what about the CPU cost. Couldn't we just add more CPU's to handle the extra drives? I think it is a close call, but usually CPU modules are more expensive. Furthermore, people buy those CPU's to do work - not to handle I/O interrupts. "It'll get out of date, I can't maintain it". Another good one. This is why we apply the KISS principle. In other words, make it talk a simple protocol, like the LANCE protocol. The Unix device driver for a smart I/O card should, for the normal code paths, be very few lines of code. The ones I've written have been "put the control block there, interrupt the card, and go away, expecting an interrupt when done". Just to show you I'm not stupid, I'd design in an eeprom for the on card code... I'm getting too sleepy to think of any more reasons against, here's a couple for: Leverage: build one of these things and make it work for every machine you sell. The disks are really slow in about getting fast, the life of a well designed card is probably 3-5 years. Scaling: if each card has the MIPS to do the I/O, adding more cards adds more MIPS. Your system will scale up gracefully. Capacity: you'll always have more I/O slots than CPU slots. If you go the MP route, you'll run out of MIPS before you run out of I/O slots. This is called an unbalanced machine. Price: if you did a nice job leveraging your card, you are shipping a lot of them, probably across several CPU designs. That means volume and volume means you got the cost of the ASICs down to something reasonable. To sum up: I think many people have been burned by poorly designed, under powered "smart ass" I/O interfaces. The fact that bad designs exist does not preclude the possibility of good designs. In fact, we have but to turn to the mainframe people to see how this is done. I/O channels have been around for a long time and they are coming to Unix. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
af@cs.cmu.edu (Alessandro Forin) (03/22/91)
[My first attempt at posting this apparently failed, sorry if duplicated] I share Larry's comments, and since I am not a hardware person I'd like to perhaps hear the experts on the following, very concrete example. Larry mentions SCSI, so that's my example. No blessing or flaming of any particular manifacturer is intended in the following, pls get me right. When I wrote my scsi driver for Mach I stomped into the specs for the NCR 53C700 "Script" controller, and I liked it so much, even if it wasn't the chip I had to fight with, that I modeled my chip-dependent code (for both the NCR53c94 and the DEC SII 7061) after it. Both of the latter chips have the problem Larry mentions of generating too many interrupts, albeit the 53c94 behaves better. Sounds like a general problem, but not one that would surmise with the Script chip. So the question is: how much would it cost to re-design the current Decstation 5000 SCSI board (it uses the NCR53c94) to use the Script ? In your thinking I do not believe you can assume much pin-compatibility between the "old" scsi chip and the "new" scsi chip. So I guess I am asking - how much does it take to design a scsi board from scratch ? - how 'bout a redesign ? - what is the cost of the parts in it, does "the chip" account for most of it or not ? [This would answer the question of using "expensive CPUs" as I/O processors also] - what's the price tag on the 53c700 - where does a chip like this "loose" ? - how bad is the problem of coupling a DMA controller to an off-the-shelf scsi chip ? - do hardware engineers like to spend their time working on designing these boards or not ? - how could they make the ADAPTEC board affordable to PC owners. I realize these might not be the right questions/all the questions, but hey, it's a start. sandro-