bright@dataio.UUCP (Walter Bright) (11/06/86)
I am interested in copying pixel data from one page to another on the IBM EGA. This involves moving 128k bytes of data. Doing it with a REP MOVSW takes about 1/2 second (on an AT), which is too slow. Does anyone know how the DMA channel could be programmed to do this? It is not clear from the documentation how to program the DMA chip, or even if it is capable of memory-to-memory transfers. Thanks for any help.
spud@oliveb.UUCP (John E. Purser) (11/12/86)
In article <1189@dataio.UUCP> bright@dataio.UUCP (Walter Bright) writes: >I am interested in copying pixel data from one page to another on the >IBM EGA. This involves moving 128k bytes of data. Doing it with a >REP MOVSW takes about 1/2 second (on an AT), which is too slow. >Does anyone know >how the DMA channel could be programmed to do this? It is not clear >from the documentation how to program the DMA chip, or even if >it is capable of memory-to-memory transfers. How did you arrive at the time of 1/2 second? The way I figure this it should only take about .05 seconds. According to the 286 programmers referance guide a REP MOVSW takes 5+(4*CX) clocks. In your example that would be 64k words times 4 plus 5 or a total of 262,149 clocks. The clock speed of the AT is 6Mhz so dividing the 262,149 by 6,000,000 leaves us with .045 seconds. It may be that the video RAM is slow and requires a wait state or 2 on each access but thats a memory limitation and it won't help to use DMA in that case. Now all this is just numbers and I'm not a hardware type so let me back this up with some experience. I've done some programming on an ATT 6300 that uses an 8086 at 8Mhz. I've written routines to move 32k bytes to video RAM and it happens in the blink of an eye. It runs at a faster clock and I'm only moving 1/4 the data but according to the programmers guide it takes 17 clocks per rep on an 8086 for a MOVSW instruction so it should be about the same time as on your system. In summary you may want to investigate further using the CPU to do the move. John Purser Olivetti ATC Cupertino CA.
bright@dataio.UUCP (Walter Bright) (11/14/86)
In article <197@oliveb.UUCP> spud@oliven.UUCP (John Purser) writes: >In article <1189@dataio.UUCP> bright@dataio.UUCP (Walter Bright) writes: >>I am interested in copying pixel data from one page to another on the >>IBM EGA. This involves moving 128k bytes of data. Doing it with a >>REP MOVSW takes about 1/2 second (on an AT), which is too slow. >>Does anyone know >>how the DMA channel could be programmed to do this? It is not clear >>from the documentation how to program the DMA chip, or even if >>it is capable of memory-to-memory transfers. > >How did you arrive at the time of 1/2 second? The way I figure this >it should only take about .05 seconds. According to the 286 programmers >referance guide a REP MOVSW takes 5+(4*CX) clocks. In your example that would >be 64k words times 4 plus 5 or a total of 262,149 clocks. The clock speed >of the AT is 6Mhz so dividing the 262,149 by 6,000,000 leaves us with .045 >seconds. I did some checking. First off, the EGA is 8 bit ram, so the 128kb move should take .09 seconds. Second, the EGA needs 4 out of 5 memory cycles to do refresh, so the copy winds up taking about 1/2 seconds. DMA obviously wouldn't help much here. Also, nobody replied with a method of doing memcpy()s with the DMA.
zhahai@gaia.UUCP (Zhahai Stewart) (11/15/86)
In article <197@oliveb.UUCP>, spud@oliveb.UUCP (John E. Purser) writes: > In article <1189@dataio.UUCP> bright@dataio.UUCP (Walter Bright) writes: > >I am interested in copying pixel data from one page to another on the > >IBM EGA. This involves moving 128k bytes of data. Doing it with a > >REP MOVSW takes about 1/2 second (on an AT), which is too slow. > > How did you arrive at the time of 1/2 second? The way I figure this > it should only take about .05 seconds. According to the 286 programmers > referance guide a REP MOVSW takes 5+(4*CX) clocks. In your example that would > be 64k words times 4 plus 5 or a total of 262,149 clocks. The clock speed > of the AT is 6Mhz so dividing the 262,149 by 6,000,000 leaves us with .045 > seconds. It may be that the video RAM is slow and requires a wait state > or 2 on each access but thats a memory limitation and it won't help to use > DMA in that case. > First off, if you want to move 128K "pages", I presume that you are using the EGA in the highest resolution, 640x350x16 colors, mode 10 (hex). In this mode the video refresh seems to eat up much of the memory bandwidth; thus the EGA inserts wait states as needed until a free "access slot" is available to service the processor - this happens even on a 4.77 MHz 8088 in the PC, not to mention, for example, my 8 MHz 80286. Because of this, and the very well optimized string instructions on the 286, I doubt that DMA could do any faster than CPU based moves, copying EGA->EGA. (Even if you could get memory->memory DMA working, that is). You have two basic possibilities for EGA->EGA moves: plane by plane or all at once. For plane by plane, set the EGA to read a given plane (of 4), and to write only to the same plane, do the copy (80x350 = 28KBytes) using MOVSW with CX = 14000; then switch read and write enables to the next plane and repeat. This will be hampered by the fact that each 16 bit read or write will actually be done as 2 back to back 8 bit writes (transparent to the CPU - the EGA is an 8 bit card), each with several wait states, so this will be considerably slower than the MOVSW calculation above (which assumed real 16 bit transfers with 0 wait states). The other way is to set up the EGA to write from its internal latches, which hold 32 bits retrieved by the last EGA read (8 bits x 4 planes). Then you do a 1 byte read from the source, and a 1 byte write (contents of write do not matter, only the write strobe and address), in order to xfer 32 bits (1 byte x 4 planes). You cannot do this word at a time because the internal latches are only 8 bits wide. So in this case you use a MOVSB with CX=28000. Each cycle transfers 32 bits with only two memory cycles (and the corresponding wait states), as opposed to the first method which transfers 16 bits/rep with 4 memory cycles and corresponding waits. This should be much faster; ironically, it should run at approximately the same speed on a PC or AT, since the limitation is the EGA cycle stealing and 8 bit wide internal path. Did that come across - the second technique should work faster on a PC than the first does on an AT? Also note that a full screen image in this mode only occupies 114 KBytes, not 128 - so you can save another 10% or so if you only need to move the visible image. The exact ways to set up the EGA registers for this can be found in the IBM manuals, or PC Tech Journal had an article, etc. To much to go into here and now. Good luck. -- -- Zhahai Stewart {hao | nbires}!gaia!zhahai
jallen@netxcom.UUCP (John Allen) (11/19/86)
In article <1196@dataio.UUCP> bright@dataio.UUCP (Walter Bright) writes: >In article <197@oliveb.UUCP> spud@oliven.UUCP (John Purser) writes: >>In article <1189@dataio.UUCP> bright@dataio.UUCP (Walter Bright) writes: >>>I am interested in copying pixel data from one page to another on the >>>IBM EGA. This involves moving 128k bytes of data. Doing it with a >>>REP MOVSW takes about 1/2 second (on an AT), which is too slow. >>>Does anyone know >>>how the DMA channel could be programmed to do this? It is not clear >>>from the documentation how to program the DMA chip, or even if >>>it is capable of memory-to-memory transfers. >> >>How did you arrive at the time of 1/2 second? The way I figure this >>it should only take about .05 seconds. According to the 286 programmers >>referance guide a REP MOVSW takes 5+(4*CX) clocks. In your example that would >>be 64k words times 4 plus 5 or a total of 262,149 clocks. The clock speed >>of the AT is 6Mhz so dividing the 262,149 by 6,000,000 leaves us with .045 >>seconds. > >I did some checking. First off, the EGA is 8 bit ram, so the 128kb move >should take .09 seconds. Second, the EGA needs 4 out of 5 memory cycles >to do refresh, so the copy winds up taking about 1/2 seconds. DMA obviously >wouldn't help much here. > >Also, nobody replied with a method of doing memcpy()s with the DMA. As you just pointed out, DMA wouldn't help much. To address the real question, the INTEL documentation for the 8237 outlines a method for performing memory to memory block DMA transfers using DMA channels 0 and 1. I've used block DMA from memory to I/O using one DMA channel, (as does the FD controller, I believe) and can assure you that this works just fine. I would be surprized to hear that the memory to memory DMA didn't work. If you still want to give it a try, and need more help, please send email. John Allen ========================================================================= NetExpress Communications, Inc. seismo!{sundc|hadron}!netxcom!jallen 1953 Gallows Road, Suite 300 (703) 749-2238 Vienna, Va., 22180 =========================================================================
brian@umbc3.UMD.EDU (Brian Cuthie) (10/28/88)
[as I slip into my asbestos suit] I have decided to respond to several postings with this single response rather than tie up net bandwidth with followups to followups etc. First, I humbly apologize for two things: 1) If I came off sounding omniscient about the AT, I'm sorry. I have, in past years, designed several disk controllers for the PC and written suitable drivers that used DMA. 2) for incorrectly extrapolating PC expertise to cover the design of the AT. Some of my statements about DMA on the AT were clearly wrong as I made some bad assumptions about that particular part of the AT design. However, most of my points about DMA are true in the general case. It is true that, after pawing through the AT technical reference manual, the AT has some serious deficiencies in it's DMA design. This, however, does not change the fundamental reasons for using DMA in most systems. The Intel 80286/80386 processors have the unique ability to behave much like a DMA controller. That is, they can transfer data in single memory cycles ( please note that a memory cycle is not the same as a clock cycle). In this mode, using the string transfer instructions, the 80*86 is capable of generating address and timing signals without placing data on the bus. Thus the peripheral or memory is free to drive the data bus directly to the recipient of the data (memory for INS instructions, and peripheral for OUTS instructions). There are some instances when DMA controllers will buffer data however these are rare. Data usually flows between the peripheral and memory in single memory cycles unless, of course, the peripheral's controller cannot transfer data at memory speeds (unlikely since most peripheral controllers have some buffer cache). Normally, however, a processor does not have this ability. Thus, to transfer a block from a peripheral to memory requires that the processor read a byte/word from the peripheral and subsequently write that byte/word to memory. This operation, even under the best of caching scenarios, requires at least two memory accesses. It can be seen, then, that a processor lacking this special ability could never be as fast as a well designed DMA subsystem. DMA controllers seize the bus by placing the CPU in a HOLD state. In this state, the CPU is not able to perform any external bus accesses. Instead, all address and timing information is generated by the DMA controller. When the DMA controller has placed the CPU into a HOLD state, and has asserted the appropriate address onto the address bus, it asserts either MEMREAD (for a transfer from memory to the peripheral) or MEMWRITE (to transfer from a peripheral to memory). The device which has requested DMA recognizes these signals in conjunction with the DMA ACK signal and data is transfered over the data bus directly between the peripheral and memory with no intermediate lay-overs. It can be seen that during this transfer, the CPU will remain idle, once it has completed it's current instruction, until it can regain control of the bus. Therefor, most DMA controllers offer the ability to generate limited burst DMA transfers. The Intel 8237 is limited to either single transfers or complete block transfers. Other DMA controllers, such as the Motorola 68445 (I believe that is the correct part number), allow the burst length to be programmed over a wider range. Limiting the burst length allows some interleaving of CPU and DMA memory accesses. Interleaving CPU and DMA access to memory is usually less desirable than complete block transfers since there is substantial overhead in placing the CPU into a HOLD state. This problem can be solved by multiported memory designs. However, since processor speeds outstrip memory speeds (that is as CPUs get faster, they spend more time waiting for memory) there is little advantage to this scheme. In summary, DMA is used primarily because, in a well designed system, it can almost always be made to be more than twice as fast as the CPU in doing peripheral to memory transfers. However, memory bandwidth is limited and thus you must rob peter to pay paul, so the idea that DMA allows concurrent CPU and peripheral access to memory is somewhat mislead. -brian