[eunet.micro.acorn] My ARM2's faster than an ARM3. Waaaa.

hughesmp@vax1.tcd.ie (02/13/91)

I have a few questions about the internals of the Arc. I have a 440i, and wrote
a silly demo, !Ba on it, a while ago. I wrote it for a MEMC1a, and ARM2, but
presumed that I would only have to cater for the MEMC1 being slower. Not so.
The ARM3 also goes slower. The probable reason is that it uses a lot of in line
code, with few loops. It also involves shuffling large amounts of data around.
The ARM3 needs to constantly update its cache as a result, so allowing for it
to synchronize with the 8MHz (?) external system, it effectively runs much
slower. Switching cacheing off is also useless, because rather than operating
at the logical 8MHz, it operates at 30/4 MHz = 7.5 MHz, I think. If this is the
case, _why_ did VLSI not make it just go a little bit faster, say 32 MHz, which
would give identical performance to the ARM2, with cache off?

Also, how difficult would it be to clock the _entire_ system at 30MHz, or would
memory chips not be able to handle that? Alternatively, how difficult would it
be to get a decent sized cache, say 64k or 256k or something? For my case, the
bits of the demo that use in-line code, used it as an obvious implementation of
the problems, ie 200 pixel diameter sphere-wrapping (on the new version),
because it just looks like the fastest implementation. Even putting this into
a loop, the amount of data it operates on, in the order of 250k per frame sync
or something, the code would be overwritten as the data is loaded in, and so it
would need to be cached again, causing similar problems. Can I fix the
processor so that it doesn't load accessed _data_ into the cache? And if I
can't, why does the processor put the data in a pseudo-random location
en-cache? Surely sequential locations would be more logical, because it would
take longer for code to be overwritten by accessed data, is it a) VLSI have 
their reasons, or b) it isn't more logical?

Another question... Some people say my demo works on the ARM3, others say it
doesn't. I assume that the two parties have 30 MHz ARM3s... Could it be that
i/o from podules slows down the machine, even when the podule isn't in use;
although some of the machines it slows down on don't have many podules to
speak of, plugged in. Or are there other pseudo-random elements to the
processor's operation?

I don't have an ARM3, I have just glanced at the VLSI chipset manual.. Any help
would be greatly appreciated...

Tracy.
SICK. We use ARM2s. They're faster.