as@prg.ox.ac.uk (Andrew Stevens) (02/14/91)
A question to the knowledgeable: The recent comments on speeding up Arch's when using big modes reminds me of something that has puzzled me for some time about the Arch architecture: In a basic Arch you get contention between the video-subsystem and the CPU - the MEMC multiplexes the two onto the same memory sub-system. It even generates the video-DMA addresses as I understand it. So, what happends in a machine with 2 or more MEMC's? Can you fix things up so that RAM controlled by one MEMC supports video DMA (and runs slow), whilst a second bank with its own MEMC runs at full tilt. In an A540, for example, does the machine run faster when more than one RAM bank is added? Also, does MEMC support general-purpose DMA as well as video? E.g. for winchesters etc, or is all that sort of stuff (as I heard rumoured) done by the main CPU? The latter would at least partially explain the reputedly poor through-put of Rxy0 UNIX systems when paging. A 32K page size certainly doesn't help, but it shouldn't impact through-put all that much. If the system isn't thrashing other runnable processes should happily soak up CPU time while a page-fault is fixed. However, if the CPU was *busy* during page-faults sluggish performance during paging would be no suprise at all. Andrew Andrew Stevens Programmming Research Group JANET: Andrew.Stevens@uk.ac.oxford.prg Oxford University Computing Laboratory INTERNET: Andrew.Stevens@prg.ox.ac.uk 11 Keble Road, Oxford, England UUCP: ...!uunet!mcvax!ukc!ox-prg!as
mark@acorn.co.uk (Mark Taunton) (02/15/91)
[Apologies to those who have seen this article already: I had some problems trying to cancel the first attempt which went out with an incorrect Reply-To field.] In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes: > >In a basic Arch you get contention between the video-subsystem and the >CPU - the MEMC multiplexes the two onto the same memory sub-system. It >even generates the video-DMA addresses as I understand it. So, what >happends in a machine with 2 or more MEMC's? > >Can you fix things up so that RAM controlled by one MEMC supports video >DMA (and runs slow), whilst a second bank with its own MEMC runs at full >tilt. In an A540, for example, does the machine run faster when more >than one RAM bank is added? No. Multiple-MEMC machines (extended A440/R140, A540/R260) still have only one data bus and one address bus, so there is no extra data bandwidth from the extra MEMC. However the A540 and R260 have a faster memory subsystem (12 MHz instead of 8 MHz) which provides a useful increase in bandwidth and reduces the impact of high video data rates (as does the ARM3 cache by reducing the processor's bus usage). As you surmise, the first MEMC always handles the video (and other) DMA traffic. It *might* be possible in a different design to have the other MEMCs in a complex data bus buffering scheme, allowing the processor to get at its own data while VIDC is loaded from screen memory. However you should note that (a) the current multiple-MEMC system design requires the MEMCs to be tightly synchronised at all times, so breaking this would probably involve some quite complex hardware glue, if it is indeed possible, and (b) I am not a hardware designer, and haven't looked into this in any detail. >Also, does MEMC support general-purpose DMA as well as video? >E.g. for winchesters etc, or is all that sort of stuff (as I >heard rumoured) done by the main CPU? > >The latter would at least partially explain the reputedly poor >through-put of Rxy0 UNIX systems when paging. A 32K page size certainly >doesn't help, but it shouldn't impact through-put all that much. If the >system isn't thrashing other runnable processes should happily soak up >CPU time while a page-fault is fixed. However, if the CPU was *busy* >during page-faults sluggish performance during paging would be no >suprise at all. > No, in current Acorn systems MEMC handle only special DMA for video (two channels, one for main screen data, one for cursor data) and sound. There is no separate DMA hardware for any other data traffic. The built-in ST506 controller in the A4x0/R140 has its own buffering, and the processor is required to transfer the data under interrupt on each 256-byte sector boundary, or once every 500 microseconds or so during a multi-sector transfer. On my last check, the RISC iX code to handle this on an R140 takes in the region of 140 microseconds (more in high bandwidth screen modes), so there is a definite reduction in, but not a complete loss of, available CPU power during disc transfers. The situation is similar on the R260 in that the data is moving in/out via buffers on the SCSI expansion card. I have no figures for the actual transfer rate there, but of course the memory end of the transfer will go rather quicker than on an R140 because of increased memory bus and processor speed, and because the transfer loop code will be in the ARM3 cache. >Andrew > Andrew Stevens > Programmming Research Group JANET: Andrew.Stevens@uk.ac.oxford.prg > Oxford University Computing Laboratory INTERNET: Andrew.Stevens@prg.ox.ac.uk > 11 Keble Road, Oxford, England UUCP: ...!uunet!mcvax!ukc!ox-prg!as Mark Taunton Acorn Computers Ltd mark@acorn.co.uk
nbvs@cl.cam.ac.uk (Nicko van Someren) (02/15/91)
In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes: >In a basic Arch you get contention between the video-subsystem and the >CPU - the MEMC multiplexes the two onto the same memory sub-system. It >even generates the video-DMA addresses as I understand it. So, what >happends in a machine with 2 or more MEMC's? One MEMC is the master MEMC and all others are slave MEMCs. A MEMC knows which it is by the state of the byte/word line when the last reset took place. >Can you fix things up so that RAM controlled by one MEMC supports video >DMA (and runs slow), whilst a second bank with its own MEMC runs at full >tilt. In an A540, for example, does the machine run faster when more >than one RAM bank is added? No. The CPU memory accesses and the video DMA go along the same data bus so one must hold up the other even if there is more than one bank or RAM. >Also, does MEMC support general-purpose DMA as well as video? >E.g. for winchesters etc, or is all that sort of stuff (as I >heard rumoured) done by the main CPU? The rumour you heard was right. MEMC provides DMA for video and sound but has no general DMA. >Andrew Nicko +-----------------------------------------------------------------------------+ | Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903 | | "Go and buy an Aleph One ARM3 card and stop whining!!!" | +-----------------------------------------------------------------------------+
andras@alzabo.ocunix.on.ca (Andras Kovacs) (02/15/91)
In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes: >In a basic Arch you get contention between the video-subsystem and the >CPU - the MEMC multiplexes the two onto the same memory sub-system. It >even generates the video-DMA addresses as I understand it. So, what >happends in a machine with 2 or more MEMC's? > >Can you fix things up so that RAM controlled by one MEMC supports video >DMA (and runs slow), whilst a second bank with its own MEMC runs at full >tilt. In an A540, for example, does the machine run faster when more >than one RAM bank is added? "A single MEMC will control up to 4M bytes of DRAM. A second MEMC can be built into a system to extend the maximum addressable DRAM to 8M bytes. The two MEMCs are configured as a Master and a Slave, where the Slave acts purely as a DRAM driver (all DMA operations, I/O controller interactions, etc. are handled by the Master)." (VL86C010 book from VLSI, Inc.) >Also, does MEMC support general-purpose DMA as well as video? >E.g. for winchesters etc, or is all that sort of stuff (as I >heard rumoured) done by the main CPU? It seems to me that there is no general-purpose DMA in the MEMC; I think the designers thought that with the extra overlapping regs in FIQ mode, you can emulate a DMA channel quite efficiently. >The latter would at least partially explain the reputedly poor >through-put of Rxy0 UNIX systems when paging. A 32K page size certainly >doesn't help, but it shouldn't impact through-put all that much. If the >system isn't thrashing other runnable processes should happily soak up >CPU time while a page-fault is fixed. However, if the CPU was *busy* >during page-faults sluggish performance during paging would be no >suprise at all. I am afraid that the 32K page size indeed a poor choiche in a demand-paged virtual memory system; my understanding is (from the stuff floating on comp. sys.arch) that in order to provide half-decent performance, you have to have ~4KB page sizes. Also, 'cause the MEMC does not provide statistical info about the pages (dirty bit, # of hits) write-back and good page-fault replacement strategy is a problem. Anyone could shed more light on this aspect? >Andrew Andras -- Andras Kovacs andras@alzabo.ocunix.on.ca Nepean, Ont.