waggoner@dtg.nsc.com (Mark Waggoner) (11/18/89)
Does anyone know what the maximum expected latency for a dma
peripheral on the Amiga bus is? The recent postings about dma vs.
non-dma disk controllers mention that dma can be locked out for a long
time when an overscan screen is being displayed, but how long is "a
long time?"
Thanks.
--
,------------------------------------------------------------------.
| Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com |
`------------------------------------------------------------------'
cmcmanis%pepper@Sun.COM (Chuck McManis) (11/18/89)
In article <322@blenheim.nsc.com> waggoner@dtg.nsc.com (Mark Waggoner) writes: > ... The recent postings about dma vs. non-dma disk controllers mention > that dma can be locked out for a long time when an overscan screen is > being displayed, but how long is "a >long time?" Given a 704 X 440 screen with some sprites, audio, copper and blitter activity I'm pretty sure it is possible to lock out DMA for one full frame, 14mS at 60Hz. Frame time = 16.6 mS but chip activity is concentrated during the (220lines/frame)/(60 frames/second * 262.5 max lines/frame) which equals 13.9 mS. That would be slightly longer for PAL systems. --Chuck McManis uucp: {anywhere}!sun!cmcmanis BIX: cmcmanis ARPAnet: cmcmanis@Eng.Sun.COM These opinions are my own and no one elses, but you knew that didn't you. "If it didn't have bones in it, it wouldn't be crunchy now would it?!"
waggoner@dtg.nsc.com (Mark Waggoner) (11/18/89)
Combining some threads: Chuck McManis writes: > waggoner@dtg.nsc.com (Mark Waggoner (me)) writes: > > ... The recent postings about dma vs. non-dma disk controllers mention > > that dma can be locked out for a long time when an overscan screen is > > being displayed, but how long is "a long time?" > > Given a 704 X 440 screen with some sprites, audio, copper and blitter > activity I'm pretty sure it is possible to lock out DMA for one full frame, > 14mS at 60Hz. > Frame time = 16.6 mS but chip activity is concentrated during > the (220lines/frame)/(60 frames/second * 262.5 max lines/frame) > which equals 13.9 mS. > > That would be slightly longer for PAL systems. > in article <1989Nov16.185706.29328@sjsumcs.sjsu.edu>, 33014-18@sjsumcs.sjsu.edu (Eduardo Horvath) says: > > > Can you DMA directly into FAST RAM, > > Yes. In fact, it's greatly preferred and recommended. > > > If a controller DMA'd into FAST RAM, wouldn't that solve the problem > > of contention with the custom chips? > > It can solve most of the problem. There are two components to the DMA transfer. > DMA to Fast memory will solve the second, which is the basic transfer rate for > whatever block size the controller transfers in one chunk. The first problem is > what I call "DMA lag", or how long it takes from the time your controller asks > for the bus to when it actually gets the bus. In order to acquire the bus, the > CPU bus be finished with a bus cycle. If the CPU is in wait states, waiting for > access to the chip bus, the DMA controller will have to wait for the CPU to > finish it's cycle, (eg, wait for the chip bus to be free), before it can take > over the bus. DMA controllers often transfer a whole block (512 bytes) in > several DMA passes, so it's actually possible to incur this lag several times for > each block, if your CPU is doing lots of stuff with video memory. Also, if you > have an autoboot controller of any kind that copies it's code to RAM before > using it, you get slowdowns if your autoboot card is the first one in the machine, > since that code will get copied into chip memory. So, unless you know your code > is running from ROM, or you have something like an A2620/A2630 that puts autoconfig > RAM in before your device is configured, it's best to put a memory card in before > your device. Hopefully all-in-one memory/disk cards autoconfig the memory before > the disk. > I am not sure I understand all of this. Some questions: (Perhaps I should RTFM) 1. Can the CPU be locked out of chip memory for long periods of time? I thought it alternated cycles with the video dma. If it DOES alternate cycles, it should be able to complete any operation within a few cycles. Then, the amount of video dma shouldn't affect the latency in acquiring the bus by more than a couple of bus cycles. If it DOES NOT alternate cycles and can be locked out of chip memory, then "fast" memory is indeed fast only in that you can perform your dma bursts faster. 2. When a peripheral is dma'ing to/from chip memory, what restrictions are placed on it in terms of burst size. Can it use consecutive cycles? How is this handled? 3. 14 mS of latency is a VERY long time. Doesn't this mean that almost any peripheral will have to have local buffer ram? Take Ethernet, for example (something I can talk about without getting too confused). In 14ms, you could receive 14mS / (800nS/byte) = 17500 bytes of data. Of course, this is neglecting the fact that you shouldn't get packets that big and packets can't come back to back. But still, there is no way you could ever count on getting hold of the system bus in time to buffer a packet unless you had either a FIFO or local memory big enough to hold at least 20K. This makes peripherals much more expensive. Could you even transfer 20K over to the system memory in the time between video frames? (OK, so you aren't likely to get a burst of packets like that, but it IS possible). Any clarifications would be appreciated. -- ,------------------------------------------------------------------. | Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com | `------------------------------------------------------------------'
dougp@voodoo.ucsb.edu (11/19/89)
-Message-Text-Follows- In article <128061@sun.Eng.Sun.COM>, cmcmanis%pepper@Sun.COM (Chuck McManis) writes... >Given a 704 X 440 screen with some sprites, audio, copper and blitter >activity I'm pretty sure it is possible to lock out DMA for one full frame, >14mS at 60Hz. > Frame time = 16.6 mS but chip activity is concentrated during > the (220lines/frame)/(60 frames/second * 262.5 max lines/frame) > which equals 13.9 mS. > >--Chuck McManis Given the 704x440 screen and the other conditions, how does the buss arbitraion handle the condition of the 68000 wanting access to chip ram, and the DMA wanting access to fast ram? does the bus lock up until the chip ram is free preventing the DMA from accessing fast ram or is there some more intelegent arbitration going on? Douglas Peale
cmcmanis%pepper@Sun.COM (Chuck McManis) (11/20/89)
In article <329@berlioz.nsc.com> waggoner@dtg.nsc.com (Mark Waggoner) writes: >I am not sure I understand all of this. Some questions: >(Perhaps I should RTFM) A good suggestion, the Hardware manual for the Amiga is excellent. > 1. Can the CPU be locked out of chip memory for long periods of > time? I thought it alternated cycles with the video dma. It works this way. There are several things that want bus cycles in the Amiga. The Custom chips, the CPU, and the peripherals. There is a great chart that shows what gets what, but basically the video gets the odd cycles and the CPU/CustomChips/Peripherals get the even cycles except during VBLANK and HBLANK when video doesn't need them. So yes, the CPU is on the alternate cycle but it shares those alternate cycles with other things. > If it DOES NOT alternate cycles and can be locked out of chip memory, > then "fast" memory is indeed fast only in that you can perform your > dma bursts faster. Also there is a problem where the CPU can grab the bus in an attempt to get to something in CHIP ram and block access to the expansion RAM so that in some cases you just loose period. I haven't looked real closely at the bus timing for the 500/2000 though so this may not be a problem any more. > 2. When a peripheral is dma'ing to/from chip memory, what > restrictions are placed on it in terms of burst size. Can it use > consecutive cycles? How is this handled? Peripherals get low DMA priority, when someone else wants the bus (like Agnus so she can stuff some bits into Denise) you lose it and have to wait. > 3. 14 mS of latency is a VERY long time. Doesn't this mean that > almost any peripheral will have to have local buffer ram? Take > Ethernet, for example (something I can talk about without getting > too confused). In 14ms, you could receive > 14mS / (800nS/byte) = 17500 bytes of data. > Of course, this is neglecting the fact that you shouldn't get packets > that big and packets can't come back to back. But still, there is no > way you could ever count on getting hold of the system bus in time to > buffer a packet unless you had either a FIFO or local memory big > enough to hold at least 20K. This makes peripherals much more > expensive. Could you even transfer 20K over to the system memory > in the time between video frames? (OK, so you aren't likely to > get a burst of packets like that, but it IS possible). A) You can get packets "back to back", and B) a 32K X 8 static RAM chips is only $10 and comes in a nice compact 28pin skinnydip. But that aside, you can do like 3Com did on their early boards and provide just enough ram for 1 packet (2K) and drop any that come in when you can't buffer them. This works because the protocols allow for lost packets but it does cut down on your efficiency. --Chuck McManis uucp: {anywhere}!sun!cmcmanis BIX: cmcmanis ARPAnet: cmcmanis@Eng.Sun.COM These opinions are my own and no one elses, but you knew that didn't you. "If it didn't have bones in it, it wouldn't be crunchy now would it?!"
daveh@cbmvax.UUCP (Dave Haynie) (11/21/89)
in article <322@blenheim.nsc.com>, waggoner@dtg.nsc.com (Mark Waggoner) says: > Keywords: bus dma latency > Does anyone know what the maximum expected latency for a dma > peripheral on the Amiga bus is? The recent postings about dma vs. > non-dma disk controllers mention that dma can be locked out for a long > time when an overscan screen is being displayed, but how long is "a > long time?" Same length of time that the CPU can be locked out -- the length of continuous custom chip access to the chip bus (eg, the chips always get the chip bus when they want it). In most situations, the maximum continuous chip bus access is for display fetches for a 640xNx4 screen, and is the length of a horizontal scan line. With heavy blitter and copper activity it's theoretically possible to eat up the otherwise free time during horizontal blanking, but it's usually not done. Now, if you're DMA is directed toward chip memory, getting the bus will only be the start of your problems, since you'll only have access during blanking time. If the DMA activity is to fast memory somewhere, the DMA latency could very well account for the bulk of the time you spend. > | Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com | -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough
daveh@cbmvax.UUCP (Dave Haynie) (11/21/89)
in article <329@berlioz.nsc.com>, waggoner@dtg.nsc.com (Mark Waggoner) says: > Keywords: bus dma latency > I am not sure I understand all of this. Some questions: > (Perhaps I should RTFM) > 1. Can the CPU be locked out of chip memory for long periods of > time? I thought it alternated cycles with the video dma. It depends on what you've told the video chips to do. With some video resolutions, there's no video fetch contention ever. With others, the CPU must be locked out for as long as video is being fetched. It's possible to set up the blitter such that it uses every available cycle to complete it's work (so called "blitter-nasty" mode), but typically blitter activity can use free time on the video bus without getting in the CPU's way. Other video bus activity, like floppy DMA, sprite fetches, Copper programs, audio fetches, etc. can cut down on the available time granted the CPU. Most of the time they don't cut into the CPU time at all, but if you push things hard enough, they can. > If it DOES NOT alternate cycles and can be locked out of chip memory, > then "fast" memory is indeed fast only in that you can perform your > dma bursts faster. Fast memory is called "fast" exactly because it is never locked by the Amiga chips. It isn't any faster, on a per-cycle basis, than chip memory, but the CPU always has access to it. > 3. 14 mS of latency is a VERY long time. Doesn't this mean that > almost any peripheral will have to have local buffer ram? It's a good idea for any peripheral to have local buffer RAM or FIFO in any case, just because you don't want a device to interrupt the CPU just to grab a single word, and you don't want a DMA device to take over the bus just to dump one word into memory. The amount of FIFO depends on the device. If you're dealing with something that has it's own local buffer, like a SCSI device, the controller's FIFO can be only a few words long (the 2090 FIFO is 32 words, the 2091 FIFO dis 16 words). > This makes peripherals much more expensive. Certainly a little more expensive. But it adds performance in any case. > Could you even transfer 20K over to the system memory in the time > between video frames? (OK, so you aren't likely to get a burst of > packets like that, but it IS possible). With fast memory and DMA, you only need one cycle to chip memory, at worst, to get unrestricted access to fast memory at full bus speeds. With CPU driven I/O, you'll need a few cycles, since interrupt vectors are currently stored in chip memory, but it's not that bad. Some devices may not work at acceptible rates with hires 4 plane overscan screens if you only have chip memory. With a little fast memory, there's not a big problem. > Any clarifications would be appreciated. > | Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com | -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough
daveh@cbmvax.UUCP (Dave Haynie) (11/21/89)
in article <3058@hub.UUCP>, dougp@voodoo.ucsb.edu says: > Given the 704x440 screen and the other conditions, how does the buss > arbitraion handle the condition of the 68000 wanting access to chip > ram, and the DMA wanting access to fast ram? does the bus lock up > until the chip ram is free preventing the DMA from accessing fast > ram or is there some more intelegent arbitration going on? Unfortunately, there's nothing sophisticated done here. To prevent any unsightly arbitration delays, the 68000's access to the chip bus is arbitrated simply by wait stating the 68000. This is normally a very good thing, since there's never any arbitration delay when the 68000 wants chip memory and it's OK to get it. However, when an expansion device requests the bus, the bus arbiter will give it a grant pretty quickly, but it must wait until the current cycle is finished before acknowledging that grant and taking the bus. Since the 68000 is wait stated, the cycle doesn't end until the 68000 gets the chip bus. Thus the potential for lag, even for DMA devices that aren't interested in chip memory. > Douglas Peale -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough
waggoner@dtg.nsc.com (Mark Waggoner) (11/21/89)
In article <128120@sun.Eng.Sun.COM> cmcmanis@sun.UUCP (Chuck McManis) writes: >In article <329@berlioz.nsc.com> waggoner@dtg.nsc.com (Mark Waggoner) writes: >>I am not sure I understand all of this. Some questions: >>(Perhaps I should RTFM) > >A good suggestion, the Hardware manual for the Amiga is excellent. > >> 1. Can the CPU be locked out of chip memory for long periods of >> time? I thought it alternated cycles with the video dma. > >It works this way. There are several things that want bus cycles in >the Amiga. The Custom chips, the CPU, and the peripherals. There is >a great chart that shows what gets what, but basically the video gets >the odd cycles and the CPU/CustomChips/Peripherals get the even cycles >except during VBLANK and HBLANK when video doesn't need them. So yes, >the CPU is on the alternate cycle but it shares those alternate cycles >with other things. > Actually, I got the hardware manual over the weekend and it is, for the most part, excellent. As I now understand it, the CPU gets alternating cycles in some video modes, but as the number of bit planes increases, it starts losing some of those. When you get to interlaced, hi-res mode with 4 bit planes, the CPU and other custom chips don't get any cycles at all. (Anyone care to confirm that this is correct?) When you increase the size of the screen (overscan) you start losing the HBLANK time and can, potentially, lose all the cycles for the entire video frame. If the CPU tries to access chip memory, it will get locked out until the VBLANK time. I didn't see much information in the hardware manual on how the arbitration is done when there are DMA devices present other than the standard custom chips. There is also no description of the bus timings and definitions that I could find. >> If it DOES NOT alternate cycles and can be locked out of chip memory, >> then "fast" memory is indeed fast only in that you can perform your >> dma bursts faster. > >Also there is a problem where the CPU can grab the bus in an attempt to >get to something in CHIP ram and block access to the expansion RAM so that >in some cases you just loose period. I haven't looked real closely at the >bus timing for the 500/2000 though so this may not be a problem any more. I think we are describing the same thing here. As I understand it, this can still happen. > >> 2. When a peripheral is dma'ing to/from chip memory, what >> restrictions are placed on it in terms of burst size. Can it use >> consecutive cycles? How is this handled? > >Peripherals get low DMA priority, when someone else wants the bus (like >Agnus so she can stuff some bits into Denise) you lose it and have to >wait. How do you lose the bus? This goes back to a lack of information on the bus arbitration scheme. Using plain 68000 bus arbitration signals, once the 68000 grants bus access to someone, it doesn't have any way of preempting the bus master (except maybe through a bus error). I guess my actual question was: When dma'ing to/from chip memory, do you just get wait stated around the cycles where video is accessing the bus or is there some other mechanism for this. What happens when you do consecutive memory reads or writes. > >> 3. 14 mS of latency is a VERY long time. Doesn't this mean that >> almost any peripheral will have to have local buffer ram? Take >> Ethernet, for example (something I can talk about without getting >> too confused). In 14ms, you could receive >> 14mS / (800nS/byte) = 17500 bytes of data. >> Of course, this is neglecting the fact that you shouldn't get packets >> that big and packets can't come back to back. But still, there is no >> way you could ever count on getting hold of the system bus in time to >> buffer a packet unless you had either a FIFO or local memory big >> enough to hold at least 20K. This makes peripherals much more >> expensive. Could you even transfer 20K over to the system memory >> in the time between video frames? (OK, so you aren't likely to >> get a burst of packets like that, but it IS possible). > >A) You can get packets "back to back", and B) a 32K X 8 static RAM chips is >only $10 and comes in a nice compact 28pin skinnydip. But that aside, you >can do like 3Com did on their early boards and provide just enough ram for >1 packet (2K) and drop any that come in when you can't buffer them. This >works because the protocols allow for lost packets but it does cut down >on your efficiency. By back to back I was only referring to the fact that there will be a gap between every packet of at least 9.6 uS. This doesn't do much to reduce the amount of data you could receive. In addition to the requirement that you provide local buffer ram, you will also have to do one of two things: 1. Use the CPU to copy the data from the buffer ram to system ram. 2. Build or buy a DMA machine to transfer the buffer ram to system ram. Option 1 leads to a lower performance board and option 2 is higher cost. If your ethernet controller already supports DMA, it seems a waste to have another DMA to copy the data between the two ram's. It would be nice to be able to DMA directly into system ram, but that would require a *HUGE* fifo on your ethernet controller, both for transmitting and receiving. I guess this is the price we pay for all the fancy video modes. Video memory separate from system memory would eliminate the long latency times, but reduce flexability and video performance. I am asking all of this because I am trying to get a grasp on what it would take to stick an ethernet controller I have been working on onto an Amiga card. Looks like a lot of work to build a high performance interface. -- ,------------------------------------------------------------------. | Mark Waggoner (408) 721-6306 waggoner@dtg.nsc.com | `------------------------------------------------------------------'
himacdonald@lion.waterloo.edu (Hamish Macdonald) (11/21/89)
In article <8652@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes: >... >With fast memory and DMA, you only need one cycle to chip memory, at >worst, to get unrestricted access to fast memory at full bus speeds. >With CPU driven I/O, you'll need a few cycles, since interrupt >vectors are currently stored in chip memory, but it's not that bad. >Some devices may not work at acceptible rates with hires 4 plane >overscan screens if you only have chip memory. With a little fast >memory, there's not a big problem. >... >-- >Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" It seems to me that a program such as Dave's SetCPU could either: 1) copy the exception vector table to fast RAM and remap using the MMU (w 68030 or 68851) (a la 32 bit Kickstart RAM). or 2) Assuming EXEC handles all the registers correctly, copy the vector table to fast RAM and change the Vector Base Register (on 68020 or 68010) to point to the new table. This would avoid the delay accessing chip memory during an exception. This could be a problem if anything other than the CPU alters the exception vector table, however. Hamish. ---------------------------------------------------------------- Hamish Macdonald. himacdonald@lion watmath!lion!himacdonald himacdonald@lion.uwaterloo.ca himacdonald@lion.waterloo.edu
daveh@cbmvax.UUCP (Dave Haynie) (11/23/89)
in article <18387@watdragon.waterloo.edu>, himacdonald@lion.waterloo.edu (Hamish Macdonald) says: > Keywords: fast ram exception vector table > Summary: Why not fast RAM exception vector table? > In article <8652@cbmvax.UUCP> daveh@cbmvax.UUCP (Dave Haynie) writes: >>... >>With CPU driven I/O, you'll need a few cycles, since interrupt >>vectors are currently stored in chip memory, but it's not that bad. > It seems to me that a program such as Dave's SetCPU could either: > 1) copy the exception vector table to fast RAM and remap > using the MMU (w 68030 or 68851) (a la 32 bit Kickstart RAM). As long as no one's counting on the low areas of memory actually being Chip memory, this would certainly work. I've thought of it before, but never got around to it. Now Commodore-Amiga is in charge of SetCPU. They could add this, but I think the better solution would be to teach Exec to use VBR on 68010 on up (SetCPU uses it). I've never tried to remap things by changing VBR, but long ago someone told me it didn't work. > This could be a problem if anything other than the CPU alters the > exception vector table, however. I think that would be a very bad idea. DMA device should assume they have access to generic RAM within their address space and of course chip memory, but they shouldn't believe they know anything about CPU vectors or I/O registers (some of which are inaccessible in hardware with the current system anyway). > Hamish. -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough
MARKV@kuhub.cc.ukans.edu (MARK GOODERUM - UNIV. OF KANSAS ACS - MARKV@UKANVAX) (12/15/89)
> ...They could add this, but I think the better solution would be to teach > Exec to use VBR on 68010 on up (SetCPU uses it). I've never tried to > remap things by changing VBR, but long ago someone told me it didn't > work. > >> This could be a problem if anything other than the CPU alters the >> exception vector table, however. > I tried this a few months ago on my 68010 Amiga 1000. It crashed consistanly. At first it was the UserState() bug (I didn't know about it at the time), but even using my own stuff to get into and out of supervisor it died hard. Using Gomf to watch low-memory, showed that these vectors are getting played with, and the VBR is being ignored (by the OS not the CPU of course. Along with the execption table, it would be nice if Exec would get ALL of ExecBase out of CHIP ram. > -- > Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" > {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy > Too much of everything is just enough -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mark Gooderum Academic Computing Services MARKV@UKANVAX.BITNET University of Kansas ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~