[comp.sys.atari.st] accelerators

meulenbr@cstw01.prl.philips.nl (Frans Meulenbroeks) (07/07/89)

In article <402@cwjcc.CWRU.Edu> bammi@dsrgsun.ces.cwru.edu (Jwahar R. Bammi) writes:
>please could you run qindex and post the numbers. i have a jri accl. and
>its pretty much useless, except for a very few programs. i suspect that
>most accl. without a cache are about the same. cmi has some value added in
>its fast-rom, 68881/82 and blitter sockets features..

There are only a few potential places where an accelerator will give
profit. These are:
- processing instructions which use more than 4N clock cycles, where N
  is the number of bus cycles for that instruction (including
  instruction fetch)
- if it has a cache, by caching data and instructions
- local memory for the accelerator (see below)
- perhaps by using faster roms (not sure on this one)

The major bottleneck for all accelerators is the ST memory.
Every fourth clock cycle the video needs to fetch data from the memory.
The ST is build in such a way that ideally there is a perfect
interleaving of memory operations initiated by the video and by the
processor. Since most instructions indeed have one bus cycle every
fourth clock cycle this runs nice.

The time needed to do an bus cycle is just short enough to allow this
interleaving of video and processor. However, due to the memory speed,
setup times etc. it is not possible for the processor to have two bus
cycles between each video access.

Note that the video always has priority. This means that if a 6 clock
instruction with one bus cycle (the instruction fetch) is finished, one
of the custom chips (MMU?) inserts two wait cycles before the next bus
cycle can be initiated. 

It seems to be so, that this delay is always there, even if the video
reads from the other bank of memory. However, I'm not sure if this also
applies to the ROMs.

Perhaps the most gain is achieved by using cache memory (or a 68020 or
30). This alone will on the average give a gain of about 25 %.

The gain would be a lot more if you could put a large chunk of memory to
the accelerator, thus allowing that programs fully reside in this local
memory. However, this gives the problem that DMA transfers probably
cannot access this memory, and that the screen may not be mapped into
this area.

Note: Although I'm an EE I'm not a guru in microprocessors.
      Also English is not my native language, so apologies if things are
      obscured by my raping of the language.


Frans Meulenbroeks        (meulenbr@cst.prl.philips.nl)
	Centre for Software Technology
	( or try: ...!mcvax!phigate!prle!cst!meulenbr)