krogers@esunix.UUCP (Keith Rogers) (07/22/89)
I'm sorry if this topic has already been discussed (i.e. beat to death) before in this group, but I will soon be in the market for a 386 machine and I'm confused about the two major memory architectures touted by manufacturers: pagemode interleaved vs caches. I believe the crux of my confusion rests upon my non-understanding of how data is loaded into the CPU. When an application is run, assuming few branch insturctions and mostly array related data, instructions and data come in nice contiguous streams. Herein lies the claim (I think) that an interleaved architecture runs at nearly zero wait states, because wait states are only incurred at page boundaries and refreshes. What I don't understand is how are the RAM's interleaved? An example will help to explain my question: let's say I have a 386 with 4 Meg of RAM composed of 32 1Meg x 1 DRAM chips. Since the 386 has 32 data bits going into it, it seems to me that only one read or write cycle is needed to read or write, say, a long integer or short float value, each of which is 4 bytes wide. However, if the RAM's are interleaved, the only likely interleave pattern I can fathom is to split the 32 RAM's into 4 banks of 8 RAM's each, thus yielding only 8 of the 32 bits of our long integer of data per read cycle, therefore, requiring 4 read cycles to get the whole value into the processor. A cache memory, however, can present all 32 bits of the long integer in one read cycle (presuming a hit, of course). Is this correct? I guess a related issues is: if I issue the following assembly command: MOV EAX, LONG_INT how does the 386 get the value? If the memory is cached, all 32 bits are loaded in the next state. If the memory is interleaved, it seems to me that 4 read cycles (states) are needed to present all 32 bits to the CPU. Thus while the interleaved memory may be running close to zero wait states, it is still 4 times slower than the cached memory machine, because data can only be accessed on byte boundaries rather than longword boundaires. I guess my question here is, are cached memories, in fact, accessed on longword boundaries or byte boundaries? These questions are important to me because 95% of my computing is crunching floats, longs, and doubles in tight loops, and the faster I can load a 4 (or 8) byte value into the CPU (or FPU) the better. Any enlightenment on this topic would be appreciated but I will be more interested in replies related to my number crunching needs. I only run two canned DOS programs, a word processor, and a text editor, both of which already run fast enough for me on my baby 8086 machine. Thus, I don't care which one is "best" for existing DOS programs, I want to know how I can get 32 bit operands into the CPU as fast as possible so that I can be more intelligent about how I write my own programs and which memory architecture, and hence, machine I should buy. An observation: Memory architecture, at least amongst manufacturers, seems to be one of those religious issues which could start a prolonged, and nasty flame-war. I HATE FLAME-WARS! Please restrict replies to a professional tone backed by facts. Thanks to all in advance who take some of their time to reply, thanks indeed. Keith Rogers UUCP: utah-cs!esunix!krogers
johne@hpvcfs1.HP.COM (John Eaton) (07/24/89)
<<<< < < What I don't understand is how are the RAM's interleaved? An example < will help to explain my question: let's say I have a 386 with 4 Meg of < RAM composed of 32 1Meg x 1 DRAM chips. ---------- Will not work. Use 32 256K x 4 rams instead to give you four banks of 32 bit wide memory. The four banks are decoded off the LOWEST 32 bit addresses so that reading consecutive 32 bit words causes acceses to sequential banks. John Eaton !hpvcfs1!johne