pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/11/90)
On 5 Dec 90 14:44:45 GMT, jcburt@ipsun.larc.nasa.gov (John Burton) said: In article <1990Dec5.144445.18632@abcfd20.larc.nasa.gov> jcburt@ipsun.larc.nasa.gov (John Burton) writes: jcburt> In article <PCG.90Dec4160737@odin.cs.aber.ac.uk> jcburt> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: pst> You're comparing CPU performance to I/O performance. [ ... ] Back pst> when there were REAL(tm) computers like 780, a lot of time and pst> energy went into designing efficient I/O from the CPU bus to the pst> electrons going to the disk or tty. [ ... ] Sure OS's and apps have pst> gotten bloated, but when you put a chip like the MIPS R3000 on a pst> machine barely more advanced than an IBM-AT you end up with a toy pst> that can think fast but can't do anything. pcg> No, no, no, no, no, no, no. The IO bandwidth of a typical 386 is pcg> equivalent or better than that of any UNIBUS based machine, and, in pcg> practical terms, equivalent to that of MASSBUS based ones. You can get pcg> observable raw disc data rates of 600-900KB/s and observable filesystem pcg> bandwidths of 300-500KB/s under SVR3.2 (with suitable controllers and a pcg> FFS of some sort). This is way better than a PDP-11. jcburt> True, a typical 386 machine has good I/O bandwidth, but jcburt> bandwidth isn't everything. The majority of 386 machines have an jcburt> ISA bus which is a very simple bus controlled by the cpu. When jcburt> performing I/O, the cpu blocks itself and turns control of the jcburt> bus to the I/O device. This not quite true. Actually it is not true at all. You seem to be describing synchronous programmed IO, which is not used in most ISA peripherals. Most ISA peripherals are interrupt driven, and even use DMA, and the CPU can work between interrupts. Definitely. jcburt> Machines that were originally designed as a multi-user platform jcburt> usually where set up so that the I/O could be performed without jcburt> the direct control (or blocking) of the cpu. The system bus was jcburt> designed so that multiple operations could occur more or less jcburt> independent of the cpu (multi-tasking hardware design). This is entirely true of the ISA bus and any PC system around. Hey, they even have DMA (well, read on). However, I can easily see that you misconceptions have a root in three problems with typical ISA machines, one that is particular to the design of a PC clone, and two that are particular to the most common disk controller design for such machines. For a very ugly reason, the DMA chips that perform DMA under the CPU control are nearly useless for high speed transfers, and on some designs the braindamage is bad enough that the few slow DMA channels avaialable cannot ven be shared. But there is no such restriction for DMA driven by a peripheral board itself, not by the CPU, and some (rare) boards have bus mastering ability and have their own DMA onboard. Since DMA using the CPU controlled DMA channels is so bad, the standard WD style AT controller does not use DMA. It is interrupt driven, so while the controller is seeking the disk or transferring data the CPU is free. When the controller is done seeking and transferring, the CPU gets an interrupt, and then copies byte by byte, with a very fast block move, the sector read from the controller's onboard cache to core. This is indeed done using programmed IO, synchronously and the CPU is busy while doing it, but it takes relatively little. Finally, the common type of ISA disk controller, for other relatively ugly reasons, is single threaded. This means that it cannot overlap seeks and transfers to/from multiple disks. It cannot overlap multiple tranfers because of the above mentioned sector buffer; there is only one sector buffer... In theory it could overlap seeks on two drives, or seeking on one with transfer on another, and indeed this can be done with seek buffering (ST506) devices using a clever (and obscene) hack. The really big problem for multiuser operation is the lack of overlap; the authors of the UNIX disk driver sort routine report that on with a multithreaded controller on a PDP-11, three moving arm disks operating in parallel givem under typical timesharing loads, the same performance as if they were a single fixed arm one with the sum of their capacities. This means that with a multithreaded disk controller, three disks, and typical timesharing load, the ability to move three arms in parallel is the same as having a single zero seek time arm. A big, big, big win. Two disks on a multithreaded disk controller are already a very large improvement over a single disk for timesharing, especially if you spread the (instantaneous) load across them by careful positioning of your partitions. Now back to the ISA bus. As somebody observes elsewhere, the IO bottlenecks of a timesharing system are the terminal lines and the disk controllers. If you use intelligent terminal controllers and intelligent multithreaded disk controllers you timesharing performance will be impressive, on a par with that of a VAX of the same class. Just using FIFO based serial line controllers substantially reduces terminal IO overhead; just using two ESDI controllers, one per each disk, will give tremendous improvements, because the two controllers will be able to seek and transfer in parallel. If you want higher performance use a microprocessor based intelligent serial line controller, and something like an AHA 154x disk controller, that is multithreaded, bus mastering, and has its own fast DMA channels. Ah, a final note: if you really want high performance form your multiuser ISA machine, DO NOT use in any way the console. Access to video RAM is so abysmally slow that it could consume a large portion of your bus bandwidth. If you want to do fast graphics on an ISA machine, buy an X terminal and a fast Ethernet board, don't use the console, unless you get a really expensive super intelligent video board with very fast truly 16 bit memory, but I think that for timeharing the X terminal solution is still better, and not much more expensive, because it allows further overlap in the generation fo the graphics and in its rendering on the screen. In summary: to saturate an ISA bus (5 MB/sec) you need a pretty large number of peripherals running continuously, such as more than three disks (say 800KB/sec each) and a network board (say 600KB/sec), which brings us to 2/3 of nominal. Things like a QIC tape (90KB/sec), 8 serial ports (20KB/sec for eight ports simultaneously at 19200 baud), and so on are irrelevant for bandwidth. You have then a problem with the typical high interrupts processing overheads of 386 UNIX systems, with their often badly written drivers, but if you use the right controllers even these are not that important. Let's say that a machine with 8 FIFO based serial lines, 2 < 20msec seek time discs attached to an AHA154x, a 386/25 noncaching motherboard (4 MIPS, let's say), and 16 MBytes can comfortably support 8 users doing fairly heavvy development work even using things like G++ and GNU Emacs. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
rreiner@yunexus.YorkU.CA (Richard Reiner) (12/11/90)
Thanks, Piercarlo Grandi, for your clarifying analysis of ISA bus + disk issues. I wonder if I could ask you one or two questions. >just using two ESDI controllers, one per each disk, will give >tremendous improvements [because of multi-threaded operation] What about using SCSI equipment? Do there exist SCSI host adaptors for the ISA bus which support multi-threaded operation? And what about track-buffering ESDI controllers? Would their advantages go away if they were used in the setup you suggest (since you claim that one would get effectively near-zero seek times anyway)? --richard
dougp@ico.isc.com (Doug Pintar) (12/12/90)
In article <18871@yunexus.YorkU.CA> rreiner@yunexus.YorkU.CA (Richard Reiner) writes: > >Thanks, Piercarlo Grandi, for your clarifying analysis of ISA bus + >disk issues. I wonder if I could ask you one or two questions. > >>just using two ESDI controllers, one per each disk, will give >>tremendous improvements [because of multi-threaded operation] > >What about using SCSI equipment? Do there exist SCSI host adaptors >for the ISA bus which support multi-threaded operation? > >And what about track-buffering ESDI controllers? Would their >advantages go away if they were used in the setup you suggest (since >you claim that one would get effectively near-zero seek times anyway)? > The comments below are are intended to relate to ISC Unix, but most will apply in the general case (HPDD stuff notwithstanding) -- DLP First, the use of two ESDI controllers will swamp the system before giving you much advantage. Remember, standard AT controllers interrupt the system once per SECTOR. The interrupt code must then push or pull 256 16-bit words to/from the controller. Given an ESDI raw transfer rate of 800 KB/sec (not unreasonable for large blocks) that's 1600 interrupts per second, each with a (not real fast, due to bus delays) 256-word PIO transfer. Try getting two of those going at once and the system drags down REAL fast. I've tried it on a 20 MHz 386 and found at most a 50% improvement in aggregate throughput using 2 ESDI controllers simultaneously. At that point, you've got 100% of the CPU dedicated to doing I/O and none to user code... Two drives on a single AT-compatible controller will gain you something in latency-reduction, as the HPDD does some cute tricks to overlap seeks. Bus-mastering DMA SCSI adapters, like the Adaptec 154x (ISA) or 1640 (MCA) provide MUCH better throughput. They ARE multi-threaded, and the HPDD will try to keep commands outstanding on each drive it can use. The major win is that the entire transfer is controlled by the adapter, with host intervention only when a transfer is complete. You get lots more USER cycles this way! The limiting factor here is how fast you can get transfers happening between the bus and memory. This varies from motherboard to motherboard and is unrelated to bus speed or processor speed. You normally want to tune the SCSI adpater to have no more than a 50% on-bus duty cycle, or you start losing floppy bytes (and, in the worst case, refresh!). On Compaq and Micronics motherboards, you can go at 5.7 MB/sec bursts. Some motherboards can go at 6.7 and others will go up to 8. Your max rate will be about half this, given the 50% bus duty cycle limit. Arbitration for the SCSI bus can limit this even more if you've got a bunch of drives trying to multiplex data through a slow pipe to memory. I found that I couldn't get much over 1.7 MB/sec using 3 simultaneous SCSI drives on a Compaq. Going to more drives actually slowed things down due to extra connections and releases of the sCSI bus. I would imagine I'd see a big improvement if I could get the transfer rate up to the 8 MB/sec burst rate. I'm still not convinced that cacheing controllers are a big win over a large Unix buffer cache. I usually use 1-2 MB of cache, and a couple-MB RAMdisk for /tmp if I have the memory available. Using system memory as a cache is LOTS faster than going over the bus to cache on a controller, and I trust the Unix disk updater more than some unknown algorithm used in a controller. At least when you shut Unix down with a normal controller, you know you can really power the system down. With some controllers, there's an unknown latency time before the final 'sync' and write of the superblock actually gets out there. Could get ugly. As usual, should any opinion of mine be caught or killed, ISC will disavow any knowledge of me... Doug Pintar
pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/13/90)
On 11 Dec 90 15:37:41 GMT, rreiner@yunexus.YorkU.CA (Richard Reiner) said: pcg> just using two ESDI controllers, one per each disk, will give pcg> tremendous improvements [because of multi-threaded operation] rreiner> What about using SCSI equipment? Do there exist SCSI host adaptors rreiner> for the ISA bus which support multi-threaded operation? Ah yes, the common recommendation is the Adaptec Host Adapter 154xB. It sings it dances, it is a floor wax and a dessert topping. Not only it is multithreaded, it does bus mastering without CPU involvement, does DMA with its won fast DMA technology, and does scatter/gather in hardware with command chaining. In other words, it is more of an IO coprocessor than a crude disk controller. The ISC HPDD exploits all its wonderful aspects. The only defect of the 1542 seems to be fairly long operation setup times, in the millisecond range, but I don't thing this is terribly important, unless you attach solid state disks to your SCSI bus. Other SCSI controllers (OMTI, Future Domain, WD FASST) may be as sophisticated, but I have no certain data. The Adaptec seems to be the most popular, and can be bought fairly cheaply from Tandy. Other drivers may be able to exploit all its wonders, (the Esix one maybe), but again I have no details. rreiner> And what about track-buffering ESDI controllers? A word of caution here: I have been reminded by William Bogstad by e-mail that there is another reason (that I had already mentioned myself long ago in comp.unix.i386) for which a multithreded controller is preferable to two ESDI ones. ESDI discs cannot do command chaining, which means that they scatter/gather has to be interrupt driven by the UNXI disc driver, not by the controller. This means that as the IO operations per second increase interupt processing overhead also increases, and can be come quite severe, because disk interrupt processing is a very high overhead activity in all 386 Unixes I know (not many). This obscene overhead could be largely obviated, like in some PDP/VAX drivers, with an interrupt processing fastpath, called pseudo-DMA in software. Maybe some 386 Unix vendor has already implemented it, but I am not aware of any. It can take several thousand instructions (milliseconds!) for an interrupt to be processed by a 386 Unix disk driver, and for a new block operation to be reissued. With IO operation rates of a several dozen per second on 4 MIPS processors this can represent a significant percentage of CPU time. For very high IO loads with many fast discs, hardware scatter/gather is very important. rreiner> Would their advantages go away if they were used in the setup rreiner> you suggest (since you claim that one would get effectively rreiner> near-zero seek times anyway)? Track buffering is not a property of ESDI controllers alone; some popular RLL controllers also have track buffering. Track buffering reads an entire track when you read or write a sector on that track. This is only a win if you access consecutively several sectors in the same track, otherwise it is a lose because it forces you to wait for an entire revolution to read a sector, when on average only a third/half would be enough. With old style filesystems, which are fragmented fairly easily, this is usually not a win, especially for writes; I have turned off track buffering on my RLL controller. It is instead a definite win if you use the various styles of Fast File System, as they usually succeed in keeping logically consecutive sectors physically contiguous as well, and in doing multi sector requests. Note that track buffering only influences rotational latency, not seek latency. The zero-seek-time property of three well scheduled moving arm discs moreover must be carefully understood -- it says that you get the same number of IO operations per second from 3 arms moving in parallel over 3 discs of X capacity, as you get out of a single disc with fixed heads and 3X capacity, *if* there is enough load. Note that this number of IO operations per second is lower than the number of IO operations per second from 3 discs with fixed arms, because there are two extra transfer channels than for a single disc with 3X capacity and a fixed arms. Still the speedup is impressive (but you must balance the load across the three discs!). -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
jmm@eci386.uucp (John Macdonald) (12/18/90)
In article <PCG.90Dec12195835@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: |rreiner> Would their advantages go away if they were used in the setup |rreiner> you suggest (since you claim that one would get effectively |rreiner> near-zero seek times anyway)? | |Track buffering is not a property of ESDI controllers alone; some |popular RLL controllers also have track buffering. Track buffering reads |an entire track when you read or write a sector on that track. This is |only a win if you access consecutively several sectors in the same |track, otherwise it is a lose because it forces you to wait for an |entire revolution to read a sector, when on average only a third/half |would be enough. With old style filesystems, which are fragmented fairly |easily, this is usually not a win, especially for writes; I have turned |off track buffering on my RLL controller. It is instead a definite win |if you use the various styles of Fast File System, as they usually |succeed in keeping logically consecutive sectors physically contiguous |as well, and in doing multi sector requests. | |Note that track buffering only influences rotational latency, not seek |latency. It is possible to set up a controller to give most of the benefit of track buffering without any possible loss. Have the controller do the following when attempting to read a sector: seek to the right track, start reading sectors. When the desired sector has been read process it (send it to CPU using the appropriate DMA and so on and then interrupt the CPU to terminate the IO; while this is being done continue to read the track and save each sector that is read into the track buffer. Keep a record of which sectors have been read and which haven't. Whenever the CPU's device driver handles the IO completion it will likely issue another request. When a request comes to the controller, check to see if it can be satisfied from any available buffered track. If so, do that and don't interfere with any disk reading that is still going on filling a track buffer. If not, terminate any ongoing track buffer activity for a different track and seek to the desired track, and start buffering. When the background processing is able to finish reading a track buffer and there is still no new request that requires a real disk access, then additional background activity can be done (complete filling a track buffer that has been partially filled, read a new track when many of the sectors in the previous track have been used, write out any buffered changed sectors if write-through is not being used, etc.). Since this procedure returns a result as soon as it is available, and start to process a new request as soon as it is issued, there is no loss; it is just that there is the potential for using the sectors that come under the read head during rotational latency, and in using the time between host requests, and in using the time saved by filling a host request from a track buffer. Off hand, I don't know whether any particular disk controller uses this algorithm, but I wouldn't be surprised if one did. It does require that the controller be able to do some activities simultaneously (DMA, IO completion, and new activity startup with the host; all at the same time as IO processing to the disk.) -- Cure the common code... | John Macdonald ...Ban Basic - Christine Linge | jmm@eci386
pcg@cs.aber.ac.uk (Piercarlo Grandi) (12/19/90)
On 11 Dec 90 22:58:39 GMT, dougp@ico.isc.com (Doug Pintar) said: dougp> First, the use of two ESDI controllers will swamp the system dougp> before giving you much advantage. Remember, standard AT dougp> controllers interrupt the system once per SECTOR. The interrupt dougp> code must then push or pull 256 16-bit words to/from the dougp> controller. This need not be a big problem. I have had e-mail discussion of these issues in the last few days, and I take advantage of your posting to dispel some myths publicly. The interrupt latency and sector transfer times are quite small. They, combined, amount to two or three hundred microseconds at most (100 usec interrupt latency plus time to transfer 512 bytes at 5MB/sec which is another 100 usec) depending on CPU speed and kernel design. The *real* problem is that most (all, I think) 386 UNIX disc (and tape!) drivers are poorly written, as they do not use pseudo-DMA, a standard technique of PDP/VAX drivers (it is even mentioned in the 4.3BSD Leffler book). This is described a bit later in this article. dougp> Given an ESDI raw transfer rate of 800 KB/sec (not unreasonable dougp> for large blocks) that's 1600 interrupts per second, each with a dougp> (not real fast, due to bus delays) 256-word PIO transfer. Try dougp> getting two of those going at once and the system drags down REAL dougp> fast. A *sustained* transfer rate of 800KB/sec., that is nearly 100% of peak transfer rate, is extremely rare. If you are pounding really hard on the disc you may get from each disk 300KB thru the filesystem in any given second. This translates to 600 sectors per second; you can do a sector in 200-300 microseconds, or say 4 sectors per millisecond, so we have an overhead of 150 milliseconds per every second. 15% is high, but not tragic. dougp> I've tried it on a 20 MHz 386 and found at most a 50% improvement dougp> in aggregate throughput using 2 ESDI controllers simultaneously. dougp> At that point, you've got 100% of the CPU dedicated to doing I/O dougp> and none to user code... This is mostly because the driver is written so that each IO transaction involves only one sector. Therefore for every sector the top half of the driver starts the transaction, then sleeps, the bottom half gets activated by the interrupt and wakeups the top half. The sleep/wakeup between the top and bottom halves involves, on a busy system, two context switches, which is already bad, and, most importantly, calls the scheduler. There is a paper that shows that under many UNIX ports the cost of a wakeup/sleep is not really that of the context switches, but of the scheduler calls to decide who is going to run next, as this takes 90% of the time of a process activation. With pseudo-DMA the top and bottom halves of the disk driver communicate via a queue; the top half inserts as many IO operations as it has in the queue, marking those for whose completion it wants to be notified. The bottom half will start the first operation in the queue, and then when it gets the interrupt that signals it is complete, it will immediately start the next and then, if the just completed operation was marked for notify, it will wakeup the relevant top half (note that there can be as many instances of the top half active as there are processes with IO transactions outstanding, while there will be as many instances of the bottom half as there are CPUs). This mode of operation means that the bottom half can issue IO operations as fast as the controller will take them, synchronously with each interrupt, that each IO operation will have a small overhead consisting of just the interrupt latency and sector transfer times, and that the wakeup/sleep and reschedules will not only be needed, asynchronously, for every IO transactions, which can well involve many IO operations. This is simulating an intelligent controller in the driver's bottom half. A typical IO transaction will consist of an (implied) seek command and a list of 4-8 sectors, usually contiguous, to be transferred. A block read via the buffer cache will typically cause two IO transactions, one for the sectors making up the current block, one for the read ahead block. One can also do tricks in the scheduler to reduce the cost of a reschedule. UNIX implementations are usually badly designed in this, but one could use a technique used for MUSS (SwPract&Exp, Aug 1979). The idea is to have a short term scheduler and a long term scheduler, where UNIX normally has only a long term scheduler. The short term scheduler manages, in a deterministic way, e.g. priority based or FIFO, a fixed number of processes; the long term scheduler selectes, periodically, which processes are in the short term scheduler set. The real cost of scheduling is the policy decision of which processes are eligible for scheduling. Normally this need only be changed fairly rarely, and periodically, not on every context change. Having a short term scheduler means that the cost of process switch is only marginally higher than that of a context switch, because the short term scheduler job is just to find the first ready-to-run process in a fixed size list of maybe 16 entries. A nice extra idea found in MUSS was to make the short term scheduler use bitmap queues for strictly priority based scheduling; queues are words, and each bit in a word represents a different process, and a different priority. To add a process to a queue (e.g. the ready to run queue) one just turns on its bit, and so on. Ah, if only UNIX designers and implementors had one tenth of the insight of the MUSS ones! dougp> Two drives on a single AT-compatible controller will gain you dougp> something in latency-reduction, as the HPDD does some cute tricks dougp> to overlap seeks. For a multiuser system, which is the scope of my posting, this is far more important than bandwidth. Multiusers systems are seek-limited more than bandwidth limited (for small timesharing multiuser systems, that is). dougp> Bus-mastering DMA SCSI adapters, like the Adaptec 154x (ISA) or dougp> 1640 (MCA) provide MUCH better throughput. They ARE dougp> multi-threaded, and the HPDD will try to keep commands dougp> outstanding on each drive it can use. The major win is that the dougp> entire transfer is controlled by the adapter, with host dougp> intervention only when a transfer is complete. You get lots more dougp> USER cycles this way! Yes, this is true in general. But there are twists to this argument. In the pseudo-DMA technique described above, a multithreaded, hw DMA and scatter gather controller is simulated by "lending" the main CPU to a dumb controller; the bottom half of the disk driver becomes the microcode of this "pseudo intelligent controller" and simulates the DMA and the scatter gather. The main CPU is usually *much* faster than the one that is actually put in actual intelligent controllers (say 386 vs. 8086), so IO rates _might_ be higher with a pseudo intelligent controller than a real one. On the other hand the real intelligent controller can work in parallel with the main CPU. In IO bound systems this is of course little or no benefit (because there are CPU cycles to spare), unless there are multiple intelligent controllers, which is rare. dougp> I'm still not convinced that cacheing controllers are a big win dougp> over a large Unix buffer cache. I usually use 1-2 MB of cache, Ah yes! Devoting to the cache 25% of available memory seems to be a good rule of thumb. dougp> and a couple-MB RAMdisk for /tmp if I have the memory available. But /tmp should not be on a RAM disk, it should be in a normal filesystem even if actually almost never causing IO transactions as short lived files under /tmp should exist only in the cache. Unfortunately the "hardening" features of the System V filesystem means that even short lived files will be sync'ed out (at least the inodes), but this can be partially obviated by tweaking tunable parameters. For example enlarging substantially the inode cache (almost a simportant as the block cache), and slowing down bdflush. Overall instead of having a RAM disk for /tmp, I would devoted the core that would go to it instead to enlarging the buffer and inode caches. -- Piercarlo Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (12/21/90)
In article <PCG.90Dec19145630@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: | |The *real* problem is that most (all, I think) 386 UNIX disc (and tape!) |drivers are poorly written, as they do not use pseudo-DMA, a standard |technique of PDP/VAX drivers (it is even mentioned in the 4.3BSD Leffler |book). This is described a bit later in this article. Very probably. | |This is mostly because the driver is written so that each IO transaction |involves only one sector. Therefore for every sector the top half of the |driver starts the transaction, then sleeps, the bottom half gets |activated by the interrupt and wakeups the top half. | The standard technique is for xxstrategy() sort the I/O on to a queue of pending I/O operations and then call xxstart(). xxstart() peels the next I/O off the queue and instructs the controller to do the I/O. When xxintr() is called it picks up the completed I/O and calls iodone() on the buffer, waking up anyone who's waiting for the buffer (there may or may not be anyone waiting). xxintr() call xxstart() and the process is repeated until the queue of pending I/O's is empty. This, of course, requires sane controllers but it's the standard way to do the job. More than that, it's the _textbook_ way of doing the job. Even if you have a dumb controller, and it requires several request/interrupt cycles, you do it at interrupt time, unless it's _really_ expensive. It's all a trade off. |The sleep/wakeup between the top and bottom halves involves, on a busy |system, two context switches, which is already bad, and, most |importantly, calls the scheduler. There is a paper that shows that under |many UNIX ports the cost of a wakeup/sleep is not really that of the |context switches, but of the scheduler calls to decide who is going to |run next, as this takes 90% of the time of a process activation. Modern UNIX systems only use one context switch. The switch to the scheduler's context is no longer done. The scheduler was never called to do high level scheduling from the dispatcher. The scheduler would run periodically and _assist_ processes in running by swapping old processes out and deserving processes in. However, its context was `borrowed' to do the run queue search. Its _context_ and nothing more. The search is cheap, although the switches are usually expensive. Modern UNIX systems search the run queue in the context of the process who's giving up the CPU. |Ah yes! Devoting to the cache 25% of available memory seems to be a good |rule of thumb. Sure. |dougp> and a couple-MB RAMdisk for /tmp if I have the memory available. | |But /tmp should not be on a RAM disk, it should be in a normal |filesystem even if actually almost never causing IO transactions as |short lived files under /tmp should exist only in the cache. | Oh dear, it's RAM disk time again. Where is that revolver? |Unfortunately the "hardening" features of the System V filesystem means |that even short lived files will be sync'ed out (at least the inodes), |but this can be partially obviated by tweaking tunable parameters. For |example enlarging substantially the inode cache (almost a simportant as |the block cache), and slowing down bdflush. Overall instead of having a |RAM disk for /tmp, I would devoted the core that would go to it instead |to enlarging the buffer and inode caches. Eh? Writing things out doesn't cause them to be thrown away. Boyd Roberts boyd@necisa.ho.necisa.oz.au ``When the going gets wierd, the weird turn pro...''