dave@dtg.nsc.com (David Hawley) (06/06/90)
This is a fairly lengthy posting; it contains an abstract and list of objectives for an IEEE standard on bus-based DMA architectures. The chair of this P1212 subcommittee is Mike Wenzel of Hewlett-Packard (you can contact him at mw@hprnd.hp.com). The proposed standard needs wider discussion, and I thought this would be a good forum for the ideas presented. If you want copies of the proposed standard, contact Mike. If the response is small enough, he can mail them to you direct, otherwise they will be made available through ftp or by order from a copy center. Dave Hawley, National Semiconductor (dave@dtg.nsc.com) -------------------------------------------------------------------------------- The DMA Framework document is part of the IEEE P1212 CSR Architecture Specification: a register set to be used for control, configuration, identification, and diagnostics between nodes on a system bus interconnect. The goal of the P1212 CSR Architecture is to provide a scalable, extensible bus-independent framework for interoperability between nodes capable of communicating across a system interconnect. This DMA Framework specifies recommended architectures for providing high- performance interfaces between I/O controllers and system memory. Other DMA architectures are possible in P1212-compliant systems. But to maximize inter- vendor compatibility, use of the frameworks as specified in this document is recommended for initial development. It must be emphasized that the objective is NOT to arrive at a single DMA model that will be welded forever to P1212. DMA specifications should be optional to allow for newer and better schemes, or ones more suited to a given application. Yet a point of departure is needed that will guide the initial development of common interfaces. The general idea is to carefully craft a very primitive, yet high-performance means of passing messages across the bus. This scheme should make minimal demands on the instruction set and hardware required. To this would be added several simple conventions for the structure of the messages passed--to do common operations in a common way. The most important convention would be the vendor structure used for representing data buffer segments. These message- passing and structure concepts would then form the building blocks on which specific interfaces would be built, for example, for LAN, disc, and printer interfaces. Objectives of the IEEE P1212 DMA Framework (Part III-A): A. FUNCTIONAL 1. Support DMA to disc, tape, and communication devices. 2. Allow multiple active I/O transactions to be outstanding simultaneously on a single DMA channel. 3. Provide for communication to multiple devices through a single DMA channel (e.g., SCSI). 4. Interrupts can be synchronous or asynchronous. Synchronous interrupts are generated at the completion of a pre-specified data transfer. Events not associated with active DMA requests generate asychronous interrupts. 5. Support out-of-order processing of DMA requests. 6. Allow I/O Units to support both cache-coherent and mixed-coherent systems with minimal special handling. 7. Support both powerful I/O co-processors and low-end micro-processor or custom-IC-based I/O units. 8. For static data structures, contiguous blocks of memory which are larger than a RAM page must only be obtained at system power-up time or during system re-configuration (e.g. when adding a driver). These are not encouraged. 9. For dynamic data structures, those which may be allocated and deallocated on a per-message or per-transaction basis, must not require contiguous blocks of memory which are larger than a RAM page. To conserve physical RAM, data structures must be capable of being sized for typical transactions and be extensible to cover worst-case logical transaction sizes via multiple blocks of contiguous RAM. Contiguous blocks of physical RAM must be of a convenient size for system memory managers (e.g. Unix "mbufs"). B. PERFORMANCE 1. Performance equal to or better than that of today's systems. 2. Minimize the number of Processor reads to Unit CSRs. Because of bus converters, arbitration, device and processing delays, reads have higher latency than writes through the bus, keeping the Processor waiting for the response. 3. Maximize the amount of data that the Unit reads from System Memory in one block-copy setup. For similar reasons, the more work that can be done per access to system memory, the better. 4. Require one or fewer writes to a Unit CSR per I/O transaction. If actions can be arranged into blocks in system memory, rather than involving pokes of Unit CSRs, performance should be improved. 5. Result in one or fewer interrupts per I/O. 6. Be efficient for small transfers (20 bytes) and large transfers. 7. Avoid the need for additional, sequential copy steps, especially for large user data segments. 8. Efficient for both cached and non-cached Processors. If only one entity writes into a given (cache) line of memory, then line ownership need not change, saving bus traffic for coherence protocols. For non-coherent Units and agents there is also a robustness consideration here. It's easy for the flush of a Processor's cache unintentionally to overwrite adjacent bytes of memory that were written by the unit. 9. Minimize overhead bytes: data moved through the bus that is of no interest to the party on the other side, or that is only an artifact of the DMA model itself. 10. Optimal for the normal case of command size and data fragmentation. Expandable to cover the worst-case. 11. Efficiently vector interrupts when there are many Units in a system. The source of an interrupt (one of 100's) should be easily determined. To simplify Processor designs, efficient dispatch to the proper interrupt routine should be possible even when large numbers of Units share the same interrupt bit. C. COST 1. Minimize the required hardware cost (e.g., for Processors, memory controllers and device adaptors). Yet this objective must be kept in balance with the others, instead of overriding them, especially performance. 2. Fit the implementation into a reasonable area of silicon using 1990 technology. Product development projects need backplane DMA chips. 3. Do not require the DMA controller to interpret virtual addresses. Virtual addressing is more expensive and is highly vendor-dependent. 4. Avoid the need to completely double-buffer data for active transactions in both System Memory and the I/O Unit. D. SYSTEM COMPATIBILITY 1. Avoid the use of interlocked transactions (instructions). The DMA model should not require the implementation of locked bus transactions, since these are not well supported by many existing bus standards. This is also a performance item because of processing time, bus cycles and potential contention for a lock. However... 2. Do NOT prohibit the use of locks, to support large system configurations efficiently. E. FLEXIBILITY 1. Make low-level primitives as simple as possible (for both flexibility and efficiency). 2. Efficiently adapt to a wide range of applications. 3. Must be scalable to large system configurations. 4. Except for processor interrupt CSRs, avoid specifying CSR read or write characteristics that would require custom hardware. Treat each CSR as though it could be read or written by a microprocessor in the Unit. This would include avoiding bits that don't set when written, reads that produce side-effects, writes that cause instantaneous results, etc. This also can be a robustness issue in cases where bus transactions can be retried by low-level agent hardware. (For example, if writing a register causes a count to increment, then the count could be thrown off if a bridge retries a bus write transaction.) 5. Leverage: DMA message-passing mechanisms also should be usable for processor-to-processor communication.