hughesmp@vax1.tcd.ie (02/13/91)
I have a few questions about the internals of the Arc. I have a 440i, and wrote a silly demo, !Ba on it, a while ago. I wrote it for a MEMC1a, and ARM2, but presumed that I would only have to cater for the MEMC1 being slower. Not so. The ARM3 also goes slower. The probable reason is that it uses a lot of in line code, with few loops. It also involves shuffling large amounts of data around. The ARM3 needs to constantly update its cache as a result, so allowing for it to synchronize with the 8MHz (?) external system, it effectively runs much slower. Switching cacheing off is also useless, because rather than operating at the logical 8MHz, it operates at 30/4 MHz = 7.5 MHz, I think. If this is the case, _why_ did VLSI not make it just go a little bit faster, say 32 MHz, which would give identical performance to the ARM2, with cache off? Also, how difficult would it be to clock the _entire_ system at 30MHz, or would memory chips not be able to handle that? Alternatively, how difficult would it be to get a decent sized cache, say 64k or 256k or something? For my case, the bits of the demo that use in-line code, used it as an obvious implementation of the problems, ie 200 pixel diameter sphere-wrapping (on the new version), because it just looks like the fastest implementation. Even putting this into a loop, the amount of data it operates on, in the order of 250k per frame sync or something, the code would be overwritten as the data is loaded in, and so it would need to be cached again, causing similar problems. Can I fix the processor so that it doesn't load accessed _data_ into the cache? And if I can't, why does the processor put the data in a pseudo-random location en-cache? Surely sequential locations would be more logical, because it would take longer for code to be overwritten by accessed data, is it a) VLSI have their reasons, or b) it isn't more logical? Another question... Some people say my demo works on the ARM3, others say it doesn't. I assume that the two parties have 30 MHz ARM3s... Could it be that i/o from podules slows down the machine, even when the podule isn't in use; although some of the machines it slows down on don't have many podules to speak of, plugged in. Or are there other pseudo-random elements to the processor's operation? I don't have an ARM3, I have just glanced at the VLSI chipset manual.. Any help would be greatly appreciated... Tracy. SICK. We use ARM2s. They're faster.