[comp.sys.acorn] MEMC and video DMA question

as@prg.ox.ac.uk (Andrew Stevens) (02/14/91)

A question to the knowledgeable:

The recent comments on speeding up Arch's when using big modes
reminds me of something that has puzzled me for some time about
the Arch architecture:

In a basic Arch you get contention between the video-subsystem and the
CPU - the MEMC multiplexes the two onto the same memory sub-system.  It
even generates the video-DMA addresses as I understand it. So, what
happends in a machine with 2 or more MEMC's?

Can you fix things up so that RAM controlled by one MEMC supports video
DMA (and runs slow), whilst a second bank with its own MEMC runs at full
tilt.  In an A540, for example, does the machine run faster when more
than one RAM bank is added?

Also, does MEMC support general-purpose DMA as well as video?
E.g. for winchesters etc, or is all that sort of stuff (as I
heard rumoured) done by the main CPU?

The latter would at least partially explain the reputedly poor
through-put of Rxy0 UNIX systems when paging.  A 32K page size certainly
doesn't help, but it shouldn't impact through-put all that much. If the
system isn't thrashing other runnable processes should happily soak up
CPU time while a page-fault is fixed.  However, if the CPU was *busy*
during page-faults sluggish performance during paging would be no
suprise at all.



Andrew
        Andrew Stevens                  
      Programmming Research Group       JANET: Andrew.Stevens@uk.ac.oxford.prg         
 Oxford University Computing Laboratory INTERNET: Andrew.Stevens@prg.ox.ac.uk
     11 Keble Road, Oxford, England     UUCP:  ...!uunet!mcvax!ukc!ox-prg!as

mark@acorn.co.uk (Mark Taunton) (02/15/91)

[Apologies to those who have seen this article already: I had some
problems trying to cancel the first attempt which went out with an
incorrect Reply-To field.]

In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes:
>
>In a basic Arch you get contention between the video-subsystem and the
>CPU - the MEMC multiplexes the two onto the same memory sub-system.  It
>even generates the video-DMA addresses as I understand it. So, what
>happends in a machine with 2 or more MEMC's?
>
>Can you fix things up so that RAM controlled by one MEMC supports video
>DMA (and runs slow), whilst a second bank with its own MEMC runs at full
>tilt.  In an A540, for example, does the machine run faster when more
>than one RAM bank is added?

No. Multiple-MEMC machines (extended A440/R140, A540/R260) still have
only one data bus and one address bus, so there is no extra data
bandwidth from the extra MEMC.  However the A540 and R260 have a
faster memory subsystem (12 MHz instead of 8 MHz) which provides a
useful increase in bandwidth and reduces the impact of high video data
rates (as does the ARM3 cache by reducing the processor's bus usage).

As you surmise, the first MEMC always handles the video (and other)
DMA traffic.  It *might* be possible in a different design to have the
other MEMCs in a complex data bus buffering scheme, allowing the
processor to get at its own data while VIDC is loaded from screen
memory.  However you should note that (a) the current multiple-MEMC
system design requires the MEMCs to be tightly synchronised at all
times, so breaking this would probably involve some quite complex
hardware glue, if it is indeed possible, and (b) I am not a hardware
designer, and haven't looked into this in any detail.

>Also, does MEMC support general-purpose DMA as well as video?
>E.g. for winchesters etc, or is all that sort of stuff (as I
>heard rumoured) done by the main CPU?
>
>The latter would at least partially explain the reputedly poor
>through-put of Rxy0 UNIX systems when paging.  A 32K page size certainly
>doesn't help, but it shouldn't impact through-put all that much. If the
>system isn't thrashing other runnable processes should happily soak up
>CPU time while a page-fault is fixed.  However, if the CPU was *busy*
>during page-faults sluggish performance during paging would be no
>suprise at all.
>

No, in current Acorn systems MEMC handle only special DMA for video
(two channels, one for main screen data, one for cursor data) and
sound.  There is no separate DMA hardware for any other data traffic.
The built-in ST506 controller in the A4x0/R140 has its own buffering,
and the processor is required to transfer the data under interrupt on
each 256-byte sector boundary, or once every 500 microseconds or so
during a multi-sector transfer.  On my last check, the RISC iX code to
handle this on an R140 takes in the region of 140 microseconds (more
in high bandwidth screen modes), so there is a definite reduction in,
but not a complete loss of, available CPU power during disc transfers.
The situation is similar on the R260 in that the data is moving in/out
via buffers on the SCSI expansion card. I have no figures for the
actual transfer rate there, but of course the memory end of the
transfer will go rather quicker than on an R140 because of increased
memory bus and processor speed, and because the transfer loop code
will be in the ARM3 cache.

>Andrew
>        Andrew Stevens                  
>      Programmming Research Group       JANET: Andrew.Stevens@uk.ac.oxford.prg         
> Oxford University Computing Laboratory INTERNET: Andrew.Stevens@prg.ox.ac.uk
>     11 Keble Road, Oxford, England     UUCP:  ...!uunet!mcvax!ukc!ox-prg!as

Mark Taunton
Acorn Computers Ltd
mark@acorn.co.uk

nbvs@cl.cam.ac.uk (Nicko van Someren) (02/15/91)

In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes:
>In a basic Arch you get contention between the video-subsystem and the
>CPU - the MEMC multiplexes the two onto the same memory sub-system.  It
>even generates the video-DMA addresses as I understand it. So, what
>happends in a machine with 2 or more MEMC's?

One MEMC is the master MEMC and all others are slave MEMCs.  A MEMC knows
which it is by the state of the byte/word line when the last reset took place.

>Can you fix things up so that RAM controlled by one MEMC supports video
>DMA (and runs slow), whilst a second bank with its own MEMC runs at full
>tilt.  In an A540, for example, does the machine run faster when more
>than one RAM bank is added?

No.  The CPU memory accesses and the video DMA go along the same data bus
so one must hold up the other even if there is more than one bank or RAM.

>Also, does MEMC support general-purpose DMA as well as video?
>E.g. for winchesters etc, or is all that sort of stuff (as I
>heard rumoured) done by the main CPU?

The rumour you heard was right.  MEMC provides DMA for video and sound but
has no general DMA.

>Andrew

Nicko



+-----------------------------------------------------------------------------+
| Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903    |
|       "Go and buy an Aleph One ARM3 card and stop whining!!!"               |
+-----------------------------------------------------------------------------+

andras@alzabo.ocunix.on.ca (Andras Kovacs) (02/15/91)

In article <1277@culhua.prg.ox.ac.uk> as@prg.ox.ac.uk (Andrew Stevens) writes:
>In a basic Arch you get contention between the video-subsystem and the
>CPU - the MEMC multiplexes the two onto the same memory sub-system.  It
>even generates the video-DMA addresses as I understand it. So, what
>happends in a machine with 2 or more MEMC's?
>
>Can you fix things up so that RAM controlled by one MEMC supports video
>DMA (and runs slow), whilst a second bank with its own MEMC runs at full
>tilt.  In an A540, for example, does the machine run faster when more
>than one RAM bank is added?

    "A single MEMC will control up to 4M bytes of DRAM. A second MEMC can be
built into a system to extend the maximum addressable DRAM to 8M bytes. The two
MEMCs are configured as a Master and a Slave, where the Slave acts purely as a
DRAM driver (all DMA operations, I/O controller interactions, etc. are handled
by the Master)." (VL86C010 book from VLSI, Inc.)

>Also, does MEMC support general-purpose DMA as well as video?
>E.g. for winchesters etc, or is all that sort of stuff (as I
>heard rumoured) done by the main CPU?

    It seems to me that there is no general-purpose DMA in the MEMC; I think
the designers thought that with the extra overlapping regs in FIQ mode, you
can emulate a DMA channel quite efficiently.

>The latter would at least partially explain the reputedly poor
>through-put of Rxy0 UNIX systems when paging.  A 32K page size certainly
>doesn't help, but it shouldn't impact through-put all that much. If the
>system isn't thrashing other runnable processes should happily soak up
>CPU time while a page-fault is fixed.  However, if the CPU was *busy*
>during page-faults sluggish performance during paging would be no
>suprise at all.

    I am afraid that the 32K page size indeed a poor choiche in a demand-paged
virtual memory system; my understanding is (from the stuff floating on comp.
sys.arch) that in order to provide half-decent performance, you have to have
~4KB page sizes. Also, 'cause the MEMC does not provide statistical info about
the pages (dirty bit, # of hits) write-back and good page-fault replacement
strategy is a problem. Anyone could shed more light on this aspect?

>Andrew

    Andras
-- 
Andras Kovacs 
andras@alzabo.ocunix.on.ca
Nepean, Ont.