[comp.protocols.nfs] Incremental sync

mo@messy.bellcore.com (Michael O'Dell) (03/08/91)

Line-eater fodder.

"laser-guided line-eater" fodder.
 
Incremental sync() is a good idea in some ways, but not all.

When George Goble did the famous dual-headed vax with 32 megabytes of
memory, one of the first things noticed was that once every 30 seconds,
things got very slow as update flushed out memory.  One of the things
he did was to put a clock-hand style thing in the kernel so the equivalent
of update could push out the pages in a slow, steady stream, instead of
the gigantic clumps of dirty disk blocks.

However, the assumption that there is any disk idle time is basically wrong.
On large, heavily-loaded systems, the disks run pretty constantly.
This isn't to say that drizzling things out at a controlled rate rather
than in big lumps isn't useful, but sometimes it doesn't help, either.
One real problem is that the mapped file may have semantics which require
the user program to not terminate until the write to disk has finished
because the program wants to be quite certain the data is out there.
And if the references to the file were quite random (like a large database
hash table index), then there's a good chance that incremental page pushing
did NOT clean some substantive fraction of the dirty pages, still producing
the impulse load on the disk queue.

One important observation is that no VM system can run faster than it can
get clean pages - if there is an adquate supply in memory, then fine.
Otherwise, how quickly you can turn the pages to disk is the limiting factor
for overall throughput (jobs completed per hour, or conversely, time for
a single large VM-bound job).  This factor most directly effects the
level of multiprogramming a given system can sustain, assuming the workload
isn't all just small jobs which trivially fit in memory and do no substantial
I/O. (IBM MVS I/O systems are often tuned to maximize the sustainable
paging rate without Trashing. Think about it.)

One serious elevator anomoly in fast machines is as follows.  Assume several
processes trying to read different files, each one broken into several
reasonably large chunks (could easily be extents, so that doesn't fix it).
Further, assume that one process's file is broken into several more chunks
than the others with these smaller chunks spread over the same distance
as the other files.  Finally, assume the machine is enough faster than the
disks that each process can complete its processing BEFORE the heads
can seek past the next request.  So, using the standard elevator which
resorts the queue on every request, the process with the large number
of small extents (fortuitously layed-out in request order) will completely
monopolize the disk!  Because the standard elevator will let you "get back
into the boat" even after it's left, the process gets its data, and
spins out another request before the head finishes another request in the
same neighborhood, so it zooms to the front of the elevator queue and
gets to do another read.  The poor processes with their blocks toward the
end of the run starve to death by comparsion. (Note we are assuming a
one-way elevator; it's worse with a 2-way.)  What do you do?  One scheme
used successfully is "generation scheduling".  This collects an elevator-load
of requests, sorts them, and then services them WITHOUT admitting
anyone to the car along the way.  This is a way of insuring fairness.
It also turns out that this scheme can be modified to eleviate SOME of
the problems with the big memory flush problem.  Getting the details
right is complex, but the general approach is as follows.  There is a
"generation counter" which increments at some rate like 2-5x a request
service time.  Each request is marked with the generation when it
arrives. Further, you use a 2-dimensional queue, sorting the subqueues
by generation and within the subqueue keeping the original FIFO arrival order.
(There is some discussion about sorting the sub-queues. jury is still out.)
You now load the elevator car across the subqueues, always getting at least
one request from every generation pending, more with some weighting
function like queue length. [You can implement "must do in this order"
requests by including a special generation queue which always loads the
car first and is not sorted in the car.]   A further enhancement is
to add priorities to requests.  FOr instance, LOW, MEDIUM, and HIGH,
plus FIFO looks good.  LOW is page turning when the pager isn't frantic
for clean pages, MEDIUM for normal reads and writes which you want to
complete soon, and HIGH for things like directory searches in namei()
and some metadata updates, and FIFO for critical metadata updates.
Couple this with the full generational scheduling described above
and you will go a long way toward making the low-level stuff stable
in  the face of large (and truly unavoidable) impulse loads....

Oh yes - much of the thinking about this low-level scheduling stuff
came from Bill Fuller, Brian Berliner, and Tony Schene, then of Prisma.

	-Mike O'Dell

torek@elf.ee.lbl.gov (Chris Torek) (03/09/91)

In article <1991Mar8.142031.9098@bellcore.bellcore.com>
mo@messy.bellcore.com (Michael O'Dell) writes:
>[much about disk queue sorting]

Of course, certain disk controllers (whose names I shall not name)
have a tendency to swallow up a whole bunch of I/O requests and do
them in its own order, rather than listen to / allow you to adjust
yours....

Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

torek@elf.ee.lbl.gov (Chris Torek) (03/09/91)

In article <10773@dog.ee.lbl.gov> I wrote:
>Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,

It seems people know exactly what I mean here... I got several replies
to this in the span of a few hours.  A quote from one of them:

>I think it was Rob Pike who once pointed out there is a real
>difference between "smart" and "smart-ass" controllers, and that
>in his estimation (one which I agree with completely), most
>controllers which claim to be smart, are in fact, in the other group.

This nails it down pretty well.  So: we should call it the `SO SAD
Coalition', where `SO SAD' stands for `Stamp Out Smart-Ass Devices'.

Note that there is nothing wrong with (truly) intelligent controllers,
provided that they do not sacrifice something important to attain this
intelligence.  Important things that tend to get sacrificed include
both speed and flexibility; and in fact, speed and flexibility can get
in each other's way, particularly with programmable devices.

(For instance, many SCSI controllers take several milliseconds---not
microseconds, *milli*seconds---to react to a command.  At 3600 RPM, 3
milliseconds is almost 1/5 of a disk revolution.  This is a serious
delay.  Another example is certain Ethernet chips, where the FIFOs are
just a bit too short, and when a collision occurs, they goof up the
recovery because they cannot `back up' their DMA, so they simply
restart with garbage.)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

kinch@no31sun.csd.uwo.ca (Dave Kinchlea) (03/10/91)

In article <10773@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:


   Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,
   -- 
   In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
   Berkeley, CA		Domain:	torek@ee.lbl.gov

Actually I have been having quite the oposite thoughts lately. It seems to me
that it would be highly advantagous (in the general case) to take all of
the filesystem information out of the kernel and give it to the I/O controller.

I don't just mean what requests are satisified first (although this would 
be one of its tasks) but one which also supports an abstract filesystem. 
This would take out alot of logic from the kernel, it needn't spend time
through namei() et al. let an intellegent controller do that. 

Am I missing something important here, other than the fact that no operating
system I am aware of has a concept of an abstract filesystem (except at 
the user level). There is still some logic needed re virtual memory and 
possible page-outs etc but I think it could work. Any comments?

cheers kinch
Dave Kinchlea, cs grad student at UWO (thats in London Ontario)

sef@uunet.UU.NET ( comp.std.unix) (03/10/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
>Actually I have been having quite the oposite thoughts lately. It seems to me
>that it would be highly advantagous (in the general case) to take all of
>the filesystem information out of the kernel and give it to the I/O controller.

Ye gods, no!!!!

What kind of filesystem are you going to use?  V7?  BSD?  MS-DOS?  OS/2 HFS?
MFS (Mike's FileSystem; something a friend of mine has been playing with)?
CD-ROM-type? (I forget the standards number that it goes by.)  What kind of
characters are you going to allow/disallow in filenames?  How about file
seperators?  How are you going to implement mount points?

The point of all that ranting and raving is:  no matter what filesystem you
choose, someone will come up with a better one.  Back to the original
thread:  no matter how intelligent you make your disk controller, someone is
going to want to bypass it.

-- 

Sean Eric Fagan, moderator, comp.std.unix.

mash@mips.com (John Mashey) (03/10/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
...
>Actually I have been having quite the oposite thoughts lately. It seems to me
>that it would be highly advantagous (in the general case) to take all of
>the filesystem information out of the kernel and give it to the I/O controller.

>I don't just mean what requests are satisified first (although this would 
>be one of its tasks) but one which also supports an abstract filesystem. 
>This would take out alot of logic from the kernel, it needn't spend time
>through namei() et al. let an intellegent controller do that. 
>
>Am I missing something important here, other than the fact that no operating
>system I am aware of has a concept of an abstract filesystem (except at 
>the user level). There is still some logic needed re virtual memory and 
>possible page-outs etc but I think it could work. Any comments?

It's been done; I doubt that I'd do it again, personally.
It think there are only a few design points where this makes sense.
Let's see the reasons why this often doesn't make sense, based
on the standard sort of system partitioning analysis:
	1) Where is the data?
	2) Where are the processors and how fast are they?
	3) How frequent and large are the data movements needed
	between storage(sw) and processor(s)?
So, consider some of the common cases:

1) 1 CPU, main memory, disk controller(s) with no significant local
storage (other than track caches, for example).
	This is pretty straightforward:
	CPU does all of the work.
		If what you're doing is a lot of file thrashing, that
		uses close to 100% of the CPU; if there isn't much,
		then almost 100% of the CPU can be use for other things.
	Buffer cache usage is pooled with other memory usage,
	(assuming any of the current UNIXes that do dynamic buffer
	cache sizing that adapts to mix of buffer cache versus
	other uses.)

2) CPU, main memory, disk controller that runs the filesystem.
Two cases of interest:
	2a) Disk controller CPU is substantially slower than main CPU.
	Now, compared to the previous case:
	The CPU gains back some of the time spent doing filesystem stuff.
	It ends up spending some of its time synchronizing with the
	disk controller CPU, because the interactions with the controller
	are much more frequent than if only disk I/Os need interactions.
	(This can be seen from sar-type statistics, if you compare
	#'s of logical operations/nameis/stats, etc, with physical
	I/Os.)  For example, every namei goes to the controller,
	every stat, fstat.  (I mention these because
	at least some of them sometimes do very little movement of data,
	compared with the overhead of setting them up.
	The controller CPU must also have access to the memory maps
	used by the CPU (after all, read/write system calls easily
	cross page boundaries)  and either the CPU must do substantial
	checking before handing over the request, or else the controller
	CPU needs to request the main CPU to diddle page table entries
	for it at appropriate times, initiate page-ins, etc.
	You need to build a different memory system to avoid
	unnecesary interference, compared to the previous case.
		(in the previous case, USUALLY, most of the accesses from
		controller <-> main memory are for moving data to/from
		disk (or track cache).  There are some accesses to I/O
		control information, but this should be a small fraction
		of the amount of data transferred.  Hence, one can use
		block transfers fairly often, whihc minimizes memory
		interference.
	For this case, the disk controller must frequently rummage around
	in main memory.  Maybe these accesses will fit block
	transfers, maybe not....If they don't, both CPUs will suffer.

	FINALLY, and this is the worst part, if the disk controller CPU
	is substantially SLOWER than the main CPU, all of this can easily
	run slower than the main CPU by itself, because
	you must wait until the controller completes some action.
	(I have seen this happen more than once in real life.)
SO:
	2b) Use a disk controller that is as fast as the main CPU.
	This may be better, although the cost has now gone up
	substantially, for better memory busses, caches for the
	controller CPU, probably, etc.
BUT NOW:
	You will need to make this work with more than 1 disk controller,
	(which will not be trivial).  Even with 1 CPU and 1 controller,
	you have something that's looking (from a hardware viewpoint)
	more and more like a symmetric multiprocessor:

3a) Symmetric multiprocessor: shared memory, more than 1 CPU that
can run the file system, and I/O controllers that look like the
simple ones of case 1 above.
So, suppose to avoid some of the problems of memory bandwidth and such,
you say:
	3b) The disk controller(s) have their own private memory(s).
Now, if you study this case, you discover that you impact the main
CPU(s) worse, because you end up with multiple transfers from
the controller memory into main memory (again, look at sar
data to convince yourself of this).
	Even worse, with multiple controllers, you now have a fixed
	amount of buffer cache per controller, which generally
	does not work as well as 1 large memory pool that can hold
	whatever is currently be used, given equal total amounts
	of memory.
I think the bottom line is: the only times I've ever seen this kind of
thing work, was when: the main CPU was out of gas, and one could
more easily make a smart I/O processor help offload it,
and the nature of the restructing was that:
	a) The controller CPU could convert numerous small interactions
	with a device, into much less frequent, but larger interactions
	with the CPU's main memory.
	b) The CPU/controller interaction was fairly minimal,
	or the controller provides a more efficient performance
	model without changing the programming model, especially
	for operating systems that don't already have disk caching
	built-in so strongly as UNIX.  (For instance, mainframe
	disk caching controllers, including battery-backed up memory
	might fit here.)

However, as a bottom line for UNIX systems, it often seems that
you start with dumb controllers, but if you keep makingthem smarter,
they rapidly get to being symmetric multis .... with dumb controllers
again...
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

moss@cs.umass.edu (Eliot Moss) (03/10/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:

   In article <10773@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
      Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,

   Actually I have been having quite the oposite thoughts lately. It seems to me
   that it would be highly advantagous (in the general case) to take all of
   the filesystem information out of the kernel and give it to the I/O controller.

   I don't just mean what requests are satisified first (although this would 
   be one of its tasks) but one which also supports an abstract filesystem. 
   This would take out alot of logic from the kernel, it needn't spend time
   through namei() et al. let an intellegent controller do that. 

   Am I missing something important here, other than the fact that no operating
   system I am aware of has a concept of an abstract filesystem (except at 
   the user level). There is still some logic needed re virtual memory and 
   possible page-outs etc but I think it could work. Any comments?

Yes, I think you are missing something important. To get good performance with
databases, object stores, and other I/O intensive applications, the
application needs much more control over I/O device usage, both in the sense
of space allocation (placement) and scheduling. Un*x systems have been very
poor in this area; traditional mainframe systems have typically done better.
In my opinion, the root of the difficulty is that Un*x is designed and tuned
for the behavior of certain kinds of time sharing usage, and does not provide
the features and policies needed for these other kinds of things. Moving all
the behavior off into a less controllable I/O device would seem to me to be a
step backwards.
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu

peter@ficc.ferranti.com (peter da silva) (03/11/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca>, kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
> Am I missing something important here, other than the fact that no operating
> system I am aware of has a concept of an abstract filesystem (except at 
> the user level).

AmigaDOS supports an abstract filesystem implemented as a task, and it works
quite well. The DOS itself is just a library that figures out what filesystem
handler to send requests to for each call. Evereything after the lead : in
the filename is up to the FS in question. Putting the filesystem in a
peripheral should be a piece of cake.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/11/91)

In article <10773@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

| Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,

  I think I heard this argument before when operating systems started
buffering i/o... I guess what you're looking for is MS-DOS, where you're
sure how things work (slowly).

  As long as there's a working sync or other means for my process to do
ordered writes in that less than one percent of the time when I care, I
am delighted to have things done in the fastest possible way the rest
of the time. The only time I ever care is when doing something like
database or T.P. where order counts in case of error. If I'm doing a
compile, or save out of an editor, or writing a report, as long as what
I read comes back as the same data in the same order, I really don't
care about write order (or byte order, bit order, etc) on the disk.

  There are cases when order is important, but as long as those rare
cases are satisfied, any smarts which improve performance as welcome on
my system.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (03/12/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
>In article <10773@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>
>
>   Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,
>   -- 
>   In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
>   Berkeley, CA		Domain:	torek@ee.lbl.gov
>
>Actually I have been having quite the oposite thoughts lately. It seems to me
>that it would be highly advantagous (in the general case) to take all of
>the filesystem information out of the kernel and give it to the I/O controller.
>
>I don't just mean what requests are satisified first (although this would 
>be one of its tasks) but one which also supports an abstract filesystem. 
>This would take out alot of logic from the kernel, it needn't spend time
>through namei() et al. let an intellegent controller do that. 
>
>Am I missing something important here, other than the fact that no operating
>system I am aware of has a concept of an abstract filesystem (except at 
>the user level). There is still some logic needed re virtual memory and 
>possible page-outs etc but I think it could work. Any comments?
>
>cheers kinch
>Dave Kinchlea, cs grad student at UWO (thats in London Ontario)

I see some problems with transaction processing systems which rely on
being able to absolutely control the timing of disk writes. Some (the
more efficient ones only need do this for their logs/journals) while others
want to flush out all changes made by a transaction and ensure that
it all got there before sending a terminal reply or dispensing the
ATM cash. There may be more problems with the more efficient systems
because although they don't insist on flushing out all database changes
to disk on termination of each transaction, they RELY ON NOT HAVING ANY
UNCOMMITTED (UNFINISHED) CHANGES WRITTEN TO DISK. That is, if the system
crashed, then an advanced transaction system would expect to see NONE
of the changes made by any incomplete transactions from before the crash.

If a file system cannot accommodate this kind of use then the transaction
system implementors will again be forced into using raw I/O - to
avoid the file system. 

Alas, RAW I/O is still the answer for most database/transaction systems.
They keep their own set of buffers and file structures. It need not be
so if the file system incorporates the semantic needs of transaction/database
systems.
-- 
  Dick Wilmot  | I declaim that Amdahl might disclaim any of my claims.
                 (408) 746-6108

moss@cs.umass.edu (Eliot Moss) (03/12/91)

In article <3236@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:

     As long as there's a working sync or other means for my process to do
   ordered writes in that less than one percent of the time when I care, I
   am delighted to have things done in the fastest possible way the rest
   of the time. The only time I ever care is when doing something like
   database or T.P. where order counts in case of error. If I'm doing a
   compile, or save out of an editor, or writing a report, as long as what
   I read comes back as the same data in the same order, I really don't
   care about write order (or byte order, bit order, etc) on the disk.

There is a problem with this reasoning, which is the assumption that the
design and tuning worked out by the manufacturer is anything like what *you*
need for *your* application mix. And in many cases the *design* is limiting
and there is no way to tune it the way you need. What I am getting at is not
just the semantic requirement that certain things be written (or not written)
to disk at certain points or in certain orders, but performance requirements
related allocation and I/O scheduling. Just because something works nicely for
a time sharing mix does not mean it's reasonable for databases, etc.
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu

torek@elf.ee.lbl.gov (Chris Torek) (03/12/91)

>In article <10773@dog.ee.lbl.gov> I wrote:
>>Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,

In article <3236@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com
(bill davidsen) writes:
>... any smarts which improve performance as welcome on my system.

But that is just the point: these `Smart' (or, as more aptly named,
`Smart Ass') devices do NOT improve performance, and GET IN YOUR WAY
when you try to improve it yourself.  All they offer is convenience,
and often it is a sham anyway (the controller does y and z for you,
but first you have to set it up with sequence a,b,c,d,e,f,g,h,i,j
each time).
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

henry@zoo.toronto.edu (Henry Spencer) (03/13/91)

In article <3236@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>| Sometimes I think we need a Coalition to Stamp Out `Smart' I/O Devices,
>  There are cases when order is important, but as long as those rare
>cases are satisfied, any smarts which improve performance as welcome on
>my system.

The trouble is, what if the "smarts" *don't* improve performance, on your
workload?

The only sensible place to put smarts is in host software, where it can
be changed to match the workload and to keep up with new developments.
"Smart" hardware almost always makes policy decisions, which is a mistake.
The money spent on "smartening" the hardware is usually better spent
on making the main CPU faster so that you can get the same performance
with smart *software*... especially since a faster CPU is good for a
lot of other things too.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (03/13/91)

In article <KINCH.91Mar9170121@no31sun.csd.uwo.ca> kinch@no31sun.csd.uwo.ca (Dave Kinchlea) writes:
>... it would be highly advantagous (in the general case) to take all of
>the filesystem information out of the kernel and give it to the I/O controller.

Which filesystem?  System V's?  That's what you'd get, you know...

That aside, the most probable result is that your big expensive main host
CPU, which could undoubtedly run that code a lot faster, will spend all
its time waiting for the dumb little I/O controller to run the filesystem.
This is not a cost-effective use of hardware resources.

[Incidentally, is there some reason why twits (or readers written by twits)
keep saying "Followup-To: comp.protocols.nfs", when this topic is only
marginally related to NFS and highly related to architecture?  It's quite
annoying to have to keep manually fixing this.]
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/13/91)

In article <1991Mar12.194704.17859@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

| The only sensible place to put smarts is in host software, where it can
| be changed to match the workload and to keep up with new developments.
| "Smart" hardware almost always makes policy decisions, which is a mistake.
| The money spent on "smartening" the hardware is usually better spent
| on making the main CPU faster so that you can get the same performance
| with smart *software*... especially since a faster CPU is good for a
| lot of other things too.

  And in a perfect world the faster CPU to provide the boost will cost
the same as the smart controller. Unfortunately I don't live there, and
I suspect most readers don't either.

  The incremental price to get i/o improvement is *in most cases* much
smaller to upgrade a controller (add cache, whatever) than to upgrade
the CPU speed and all the support circuits that implies. For a
multiprocessor system this becomes incrementally true.

  There's also an issue of reliability. For any hardware or software
failure other than power failure the smart controller seems more likely
to complete moving the data from cache to disk than the kernel to move
it from disk buffers to disk. That's a gut feling, the smart controller
may have a higher failure rate than the dumb controller, but it seems
likely that the smaller hardware and software content of a controller
will make it more reliable than the CPU, memory, and any other
controllers which could do something to make the o/s crash.

  The interesting thing is that in systems with multiple CPUs, if one
CPU is handling all the interrupts it has a tendancy to become an
extremely expensive smart controller. Yes, it can do more for the CPU
bound jobs, but is that cost effective for any load other than heavy
CPU? 

  I see no reason why an expensive CPU should be used to handle scatter
gather, remap sectors, log errors and issue retries, etc. It doesn't
take much in the way of smarts to do this stuff. It certainly doesn't
take floating point or vector hardware, processor cache, or an MMU
which supports paging.

  A CPU designed for embedded use can have a small interrupt controller
and some parallel i/o built into the chip. This lowers chip count,
connections, and latency, which means smaller, less expensive, and more
reliable devices. And i/o buffers can be made from cheap slow memory
and still stay ahead of the physical devices.

  Moving the filesystem to another CPU isn't really a "smart controller"
it's "distributed processing" more or less. That's certainly not what I
mean by smart controller, at any rate, so maybe the term is being used
loosely. I'm all in favor of having the decisions made by the o/s, but
when it comes time to actually move data from memory to disk, I'll find
something better for my CPU to do than keep an eye on the process.

  If I can issue an i/o and tell when it's done, and if the controller
is configured to insure that data don't sit in the cache for more than
time X (you define X), then I don't see any problem providing ordered
writes as needed for data security, and good performance as needed by
loaded and i/o bound machines. That's what I mean by a smart controller
and that's what I think is optimal for both performance and cost
effectiveness.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/13/91)

In article <1991Mar12.202238.19586@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

| That aside, the most probable result is that your big expensive main host
| CPU, which could undoubtedly run that code a lot faster, will spend all
| its time waiting for the dumb little I/O controller to run the filesystem.
| This is not a cost-effective use of hardware resources.

  This is the heart of the matter, and I agree completely. What I can't
see is how anyone can feel that the main CPU should be wasted in error
logging and retries, bad sector mapping, and handling multiple interrupts.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (03/13/91)

>The only sensible place to put smarts is in host software, where it can
>be changed to match the workload and to keep up with new developments.
>"Smart" hardware almost always makes policy decisions, which is a mistake.
>The money spent on "smartening" the hardware is usually better spent
>on making the main CPU faster so that you can get the same performance
>with smart *software*... especially since a faster CPU is good for a


I don't know about "usually" - depends on how you define "smart".  You 
can't get much main CPU for the couple of dollars more it costs to 
have smart(er) serial ports (which can provide significant performance 
increases).  

Same with smart keyboards, smart graphics controllers, smart terminals, etc.

Smart hardware is usually quite effective for small simple jobs.

-- 
Jon Zeeff (NIC handle JZ)	 zeeff@b-tech.ann-arbor.mi.us

pcg@test.aber.ac.uk (Piercarlo Antonio Grandi) (03/14/91)

On 12 Mar 91 22:36:00 GMT, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) said:

davidsen> In article <1991Mar12.202238.19586@zoo.toronto.edu>
davidsen> henry@zoo.toronto.edu (Henry Spencer) writes:

henry> That aside, the most probable result is that your big expensive
henry> main host CPU, which could undoubtedly run that code a lot
henry> faster, will spend all its time waiting for the dumb little I/O
henry> controller to run the filesystem.  This is not a cost-effective
henry> use of hardware resources.

This is the low level performance side of the "down with smart devices"
argument. The more important one, as you well know, but let's restate
it, is that smart devices have theyir own "smart" policies that not
necessairly (euphemism) are anywhere near being flexible and efficient
enough. In other words, system performance should be treated as a whole,
it cannot be achieved by building assumptions into each component of the
system.

The extreme example are those caching controllers that have the
structure of a DOS volume built into their optimization patterns...

davidsen> What I can't see is how anyone can feel that the main CPU
davidsen> should be wasted in error logging and retries, bad sector
davidsen> mapping, and handling multiple interrupts.

Ahhhh, but then who should handle them? The CPU on the controller, of
course. The real *performance* question then is not about smart
controller vs. dumb controller.

Any architecture with asynchronous IO is a de facto multiprocessor; the
question is whether some of the CPUs in a multiprocessors should be
*specialized* or not, and which is the optimal power to assign to each
CPU if they are specialized.

You speak of "main" CPU, thus *assuming* that you have one main CPU and
some "smart" slave processors. The alternative is really multiple main
CPUs, whose function floats.

As to the specific examples you make, diagnostics (error logging and
retries, bad sector mapping) should all be done by software in the
"main" CPU OS anyhow, as of all things surely assumptions on error
recovery strategies should not be embedded in the drive, because
different OSes may well have very different fault models and fault
recovery policies.

Handling command chaining (multiple interrupts) can indeed be performed
fairly efficiently by the main CPU in well designed OS kernels that
offer lightweight interrupt handling and threading.

Unfortunately industry standard OS kernels are very poorly written, so
much so that for example on a 6 MIPS 386 running System V it is faster
to have command chaining handled by the 8085 on the 1542 SCSI
controller.

As to me, I'd rather have multiple powerful CPUs on an equal footing
doing programmed IO on very stupid devices than to have smart
controllers, which seems to be the idea behind Henry Spencer's thinking.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@aber.ac.uk

henry@zoo.toronto.edu (Henry Spencer) (03/14/91)

In article <3254@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>  This is the heart of the matter, and I agree completely. What I can't
>see is how anyone can feel that the main CPU should be wasted in error
>logging and retries, bad sector mapping, and handling multiple interrupts.

How often do your disks get errors or bad sectors?  How much CPU time is
*actually spent* on doing this?  Betcha it's just about zero.  You lose
some infinitesimal fraction of your CPU, and in return you gain a vast
improvement in *how* such problems can be handled, because the software
on the main CPU has a much better idea of the context of the error and
has more resources available to resolve it.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (03/14/91)

In article <ZJT-JQ+@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
> ...You
>can't get much main CPU for the couple of dollars more it costs to 
>have smart(er) serial ports (which can provide significant performance 
>increases).  

What do you mean by "smart(er)"?  If you just mean throwing in some FIFOs
to ease latency requirements and make it possible to move than one byte
per interrupt, I agree.  I was assuming that the base for discussion
was dumb i/o devices, not brain-damaged ones.  If you mean DMA, that
does *not* cost a mere "couple of dollars more" if it's the first DMA
device on your system (or, for that matter, if it's the second), and
it can actually hurt performance.  (As a case in point, the DMA on the
LANCE Ethernet chip ties up your memory far longer than data transfers
by a modern CPU would.)

>Same with smart keyboards, smart graphics controllers, smart terminals, etc.

I'm not at all sure what you mean by "smart keyboards"; if you mean having
a keyboard encoder chip to do the actual key-scanning, that does not require
any form of "smartness" -- see comments above on dumb vs. brain-damaged.
Keyboards did that long before keyboards had micros in them.  The micros
replaced dedicated keyboard encoders because they were cheaper and a bit
more flexible, not because they added useful "smartness".

"Smart" graphics controllers are useful only if they actually bring
specialized hardware resources into the graphics operations.  All too
many "smart" graphics controllers are slower and less flexible than doing
it yourself in software.  Just *talking* to them to tell them what you
want to do can take more time than doing it yourself.  (This is a
particularly common vice of "smart" devices.)

"Smart" terminals are useful only if they are programmable.

>Smart hardware is usually quite effective for small simple jobs.

Small simple jobs don't need smart hardware.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

jesup@cbmvax.commodore.com (Randell Jesup) (03/14/91)

In article <3254@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <1991Mar12.202238.19586@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>
>| That aside, the most probable result is that your big expensive main host
>| CPU, which could undoubtedly run that code a lot faster, will spend all
>| its time waiting for the dumb little I/O controller to run the filesystem.
>| This is not a cost-effective use of hardware resources.
>
>  This is the heart of the matter, and I agree completely. What I can't
>see is how anyone can feel that the main CPU should be wasted in error
>logging and retries, bad sector mapping, and handling multiple interrupts.

	I agree also that FS code should be kept in the main CPU.  Device-
handling code, though, should be pushed off as much as possible into
smart devices or auxiliary processors.  A good modern example of this is
the NCR 53c700/710.  These scsi chips are essentially scsi-processors.  They
can take a major amount of interrupt and bus-twiddling code off of the main
processor, they can handle gather/scatter, they can bus-master, they can
process queues of requests, etc.  They only interrupt the main processor
on IO completion or on nasty errors.

	Perhaps my 100 mips super-mega-pipelined processor might be able to
execute some of the code faster.  But it has to talk to an IO chip that has
a maximum access speed far slower than the processor; it has to handle a
bunch of interrupts, it requires more instructions to deal with things like
state transitions, etc, etc.  While it could be happily executing some
user process trying to do something, while a smart IO device like the 53c710
is handling a series of IO requests.  IO is far less influenced by processor
speed than many things - interrupt speed and the number of interrupts are
often more important (assuming some level of DMA in hardware).

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

glew@pdx007.intel.com (Andy Glew) (03/14/91)

>| That aside, the most probable result is that your big expensive main host
>| CPU, which could undoubtedly run that code a lot faster, will spend all
>| its time waiting for the dumb little I/O controller to run the filesystem.
>| This is not a cost-effective use of hardware resources.
>
>  This is the heart of the matter, and I agree completely. What I can't
>see is how anyone can feel that the main CPU should be wasted in error
>logging and retries, bad sector mapping, and handling multiple interrupts.
                      ^^^^^^^^^^^^^^^^^^
Warning - soapbox:

Have you ever had an I/O benchmark ruined by a bad sector in the middle of a file?
Or, worse, real-life performance on an application?
Prefetch is set up right, the disk is tuned, you read in one sector after another,
sequentially sucking up the track, and then there's a remapped sector:
now you're out of sequence, ouch!  You can spend two rotations handling a bad sector
remapping if it happens to be in exactly the wrong place... ouch!

Track remapping can be even worse: the head is stepping fairly smoothly down the middle
of the disk, when wham! it has to go to the end of the disk to find the remapped track
- which probably requires a recalibration, and another long seek and recalibration to get
back to where you left off.

If errors are at all likely to occur on the disk, the filesystem should be able to 
handle them - not by remapping, but by just stepping around them.
--
---

Andy Glew, glew@ichips.intel.com
Intel Corp., M/S JF1-19, 5200 NE Elam Young Parkway, 
Hillsboro, Oregon 97124-6497

This is a private posting; it does not indicate opinions or positions
of Intel Corp.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/15/91)

In article <PCG.91Mar13180706@aberdb.test.aber.ac.uk> pcg@test.aber.ac.uk (Piercarlo Antonio Grandi) writes:

| You speak of "main" CPU, thus *assuming* that you have one main CPU and
| some "smart" slave processors. The alternative is really multiple main
| CPUs, whose function floats.

  The CPU with the expensive cache, float, and maybe vector capability.
As opposed to an 8 bit CPU with integrated interrupt controller, some
parallel i/o, etc. And one or many, I can usually find better use for
their capabilities than manipulating status bytes.

| As to the specific examples you make, diagnostics (error logging and
| retries, bad sector mapping) should all be done by software in the
| "main" CPU OS anyhow, as of all things surely assumptions on error
| recovery strategies should not be embedded in the drive, because
| different OSes may well have very different fault models and fault
| recovery policies.

  What has decisions got to do with implementation? The CPU running the
o/s can decide how many retries (including none if there's ever a disk
that doesn't need one now and then), and what to do with the count of
retries returned by the smart controller. But to have the retries
actually done by the CPU which can do more? To what gain?

| Handling command chaining (multiple interrupts) can indeed be performed
| fairly efficiently by the main CPU in well designed OS kernels that
| offer lightweight interrupt handling and threading.

  Remember the fastest way to do something is to avoid having to do it.
Every interrupt will require a context switch in and out of the
interrupt handler. The only real low cost way to do this is to have a
set of dedicated interrupt registers (like the 2nd register set of the
Z80), and I bet no one will suggest that a CPU should dedicate area to a
set of registers just to avoid a smart controller.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (Peter da Silva) (03/15/91)

In article <1991Mar13.194527.28164@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
> some infinitesimal fraction of your CPU, and in return you gain a vast
> improvement in *how* such problems can be handled, because the software
> on the main CPU has a much better idea of the context of the error and
> has more resources available to resolve it.

Of course, in practice this becomes "print an error message on the console,
and return an error indication to the process requesting the action. If the
kernel requested it, panic".
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/15/91)

Here we're talking about putting file systems in smart processors. How about
putting other stuff there?

	Erase and kill processing.	(some PC smart cards do this,
					 as did the old Berkeley Bussiplexer)
	Window management.		(all the way from NeWS servers
					 with Postscript in the terminal,
					 down through X terminals and Blits,
					 to the 82786 graphics chip)
	Network processing.		(Intel, at least, is big on doing
					 lots of this in cards, to the point
					 where the small memory on the cards
					 becomes a problem... they do tend
					 to handle high network loads nicely)
	Tape handling.			(Epoch-1 "infinite storage" server,
					 etc...)

What else? The Intel 520 has multiple 80186 and 80286 CPUs on its smart
CPU cards, and seems to do quite an impressive job for a dinky little
CISC based machine.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

henry@zoo.toronto.edu (Henry Spencer) (03/16/91)

In article <3265@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>Every interrupt will require a context switch in and out of the
>interrupt handler. The only real low cost way to do this is to have a
>set of dedicated interrupt registers (like the 2nd register set of the
>Z80), and I bet no one will suggest that a CPU should dedicate area to a
>set of registers just to avoid a smart controller.

Nonsense.  If the handling of the interrupt is sufficiently trivial,
several modern CPUs -- e.g. the 29k -- can do it without a full context
switch, by having a small number of registers dedicated to it.  This is
a very cost-effective use of silicon, adding a small amount to the CPU
to avoid the hassle and complexity of smart controllers.

Efficient fielding of simple interrupts (ones that require no decision
making) is, in any case, a solved problem even for older CPUs.  It just
takes some work and some thought.  Blindly taking a context switch for
such trivial servicing is a design mistake.
-- 
"But this *is* the simplified version   | Henry Spencer @ U of Toronto Zoology
for the general public."     -S. Harris |  henry@zoo.toronto.edu  utzoo!henry

suitti@ima.isc.com (Stephen Uitti) (03/19/91)

In article <3253@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <1991Mar12.194704.17859@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>
>| The only sensible place to put smarts is in host software, where it can
>| be changed to match the workload and to keep up with new developments.
>| "Smart" hardware almost always makes policy decisions, which is a mistake.
>| The money spent on "smartening" the hardware is usually better spent
>| on making the main CPU faster so that you can get the same performance
>| with smart *software*... especially since a faster CPU is good for a
>| lot of other things too.

Why not be able to load the "smart controller" with whatever code
you want?  At system boot time, or whenever, just send the code
you want to the controller.  If the controller CPU and
architecture is reasonably easy to work with, it should be able
to keep up with the OS.  For that matter, it could be usable by
more than one OS - and optimized for each.  You really have a
programmable I/O processor.

Let's say you put much of the filesystem into such a controller.
For UNIX, the host would still have to have the FS code, for
other controllers & network filesystems (although, the smart
controller might talk to the network itself - offloading the CPU further).

Anyway, once done, you could compare performances & see what
you'd gained, if anything.  You might find, for example, that I/O
throughput is lower, but more main CPU bandwidth is available.
Some sites may be so I/O bandwidth limited, that they would be
better off with a less smart controller.  No problem - just
download simpler code & tell the OS to treat it as such.  You'd
still have a controller with lots of RAM for buffer cacheing -
possibly allowing the host buffer caches to be reduced.

Maybe what you'd find is that I/O bandwidth is up, but latency is
longer.  Maybe what you'd find is that everything is faster.
Maybe you'd find that in order to make the controller better than
the host CPU, you need to use a better CPU than the host.  Who
knows?  Maybe you only need a z80 class CPU with good DMA & lots
of RAM.

Of course, if you find you need the same CPU that the host uses,
you may as well make it part of a symetric multiprocessor -
unless you want to guaranty minimum I/O latency, or something.

You'd certainly have the flexibility to tune the system for database
applications.

What target applications are we talking about?  "General computing"
is pretty meaningless.  The applications I've seen are:
	Text editing.
	Publishing (requires graphics)
	Software development
	Database processing of various kinds
	CAD/CAM
	Data crunching (mad scientist)
One could identify systems aimed at each of these, and various
combinations.  It is quite likely that the smartness of the disk
controller will be more or less relevant for each application.

On a Mac, a draw program can easily become screen-bound.  The disks
may be quite slow, and not be problematic to the user.  Floppies, at
10 KB/sec may be fast enough (small files, few accesses) for a single
user.  A smart disk controller might be a complete waste of money.

>  There's also an issue of reliability.
> [handwaving deleted].

Reliability can be achieved through redundant hardware and
redundant software.  Software can run diagnostics, generate error
correction codes, perform redundant checks.  In hardware,
redundant power for controller memory [and disks] can improve the
chances that an error will not happen.  Multiple disks - mirror
disks, can improve head crash tolerance.  One system uses over 32
disks - each disk takes on one bit of a 32 bit word, plus drives
for error correction codes.  Any drive can fail with no loss of
data.  Bandwidth is also improved.  Reliability is costly.  Users
who can live with less of it, often do.

>  If I can issue an i/o and tell when it's done, and if the controller
>is configured to insure that data don't sit in the cache for more than
>time X (you define X), then I don't see any problem providing ordered
>writes as needed for data security, and good performance as needed by
>loaded and i/o bound machines. That's what I mean by a smart controller
>and that's what I think is optimal for both performance and cost
>effectiveness.

Lets say you have a controller that has its own buffers, which are
backed up in some way in case of power-fail.  Let's say that the
policy is to tell the CPU that the data is on disk as soon as it
is in the buffer.  Who cares when the data makes it to disk?  Who
cares what order it happens in?  The operating system knows what
data order was required, and sends the requests in that order.  The
buffers + disk reflect what was desired, and if there is a main
power failure, new requests are not procesesed (obviously, power is
out).  When the system comes back up, the buffers can be flushed to
disk.  For that matter, the buffers may still be valid for cache.

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (03/26/91)

In article <1991Mar15.165124.18039@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
> In article <3265@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
> >Every interrupt will require a context switch in and out of the
> >interrupt handler. The only real low cost way to do this is to have a
...
> Nonsense.  If the handling of the interrupt is sufficiently trivial,
> several modern CPUs -- e.g. the 29k -- can do it without a full context
> switch, by having a small number of registers dedicated to it.  This is

It doesn't even have to be modern!

Perhaps not as elegant as what you are referring to, but the 8 bit MC6809
used to have a FIRQ (Fast Interrupt Request) in which only a very few
registers were saved.  This let you take the interrupt, store one or
two registers explicitly, do your thing and get back out quickly.



-- 
  Terry Ingoldsby                ingoldsb%ctycal@cpsc.ucalgary.ca
  Land Information Services                 or
  The City of Calgary       ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

mlord@bwdls58.bnr.ca (Mark Lord) (03/27/91)

In article <PCG.91Mar13180706@aberdb.test.aber.ac.uk> pcg@test.aber.ac.uk (Piercarlo Antonio Grandi) writes:
<
<As to me, I'd rather have multiple powerful CPUs on an equal footing
<doing programmed IO on very stupid devices than to have smart
<controllers, which seems to be the idea behind Henry Spencer's thinking.

I vote for smart device controllers, to which the O/S can download
software.  This gives the O/S complete control, and still allows it
to optimally offload tedious tasks.  Sort of like channel processors
on mainframes..  What?  You mean it's already been done?

mike@sojurn.UUCP (Mike Sangrey) (04/21/91)

In article <ZU=9R=8@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>Here we're talking about putting file systems in smart processors. How about
>putting other stuff there?
>

The A-series computers of Unisys even put the process task switches (if I 
remember rightly) on separate hardware.

-- 
   |   UUCP-stuff:  devon!sojurn!mike     |  "It muddles me rather"     |
   |   Slow-stuff:  832 Strasburg Rd.     |             Winnie the Pooh |
   |                Paradise, Pa.  17562  |    with apologies to        |
   |   Fast-stuff:  (717) 442-8959        |             A. A. Milne     |