[net.arch] I80386 Multiprocessing and External Caches

campbell@sauron.UUCP (Mark Campbell) (10/31/86)

Question #1:
Has anyone out there looked at the I80386 with respect to multiprocessing?
In particular, how would you handle the lack of a CIOUT-like signal (MC68030)
which would allow dynamic disabling of an external cache on accesses to
specified pages?  If two processes on two different processors had shared
memory between them, it would seem impossible to maintain data integrity
if both processors had local data caches.

Question #2:
Looking at the timing diagrams of the I80386, it would appear that if
NA were constantly asserted you'd get rid of the address valid delay
times (32ns at 20MHz) at the beginning of a cycle so that you'd have
a lot more time to handle your cache tag compare.  All it would cost would
be the latching of addresses from the processor.  Any comment and/or
correction?
-- 

Mark Campbell    Phone: (803)-791-6697     E-Mail: !ncsu!ncrcae!sauron!campbell

kds@mipos3.UUCP (Ken Shoemaker ~) (11/01/86)

In article <749@sauron.UUCP> campbell@sauron.UUCP (Mark Campbell) writes:
>Question #1:
>Has anyone out there looked at the I80386 with respect to multiprocessing?
>In particular, how would you handle the lack of a CIOUT-like signal (MC68030)
>which would allow dynamic disabling of an external cache on accesses to
>specified pages?

easily done, just use one of the 32 address lines (I hope I'm not being too
short sighted in assuming that no one is going to try to hook up 4
Gbytes of physical memory to the beastie).  This has the additional
value of working in a straight segmented os as well as one that uses
paging.

>
>Question #2:
>Looking at the timing diagrams of the I80386, it would appear that if
>NA were constantly asserted you'd get rid of the address valid delay
>times (32ns at 20MHz) at the beginning of a cycle so that you'd have
>a lot more time to handle your cache tag compare.  All it would cost would
>be the latching of addresses from the processor.  Any comment and/or
>correction?
>-- 

quite correct.  The only downside is a small performance degradation of the
system owing to a 3 clock latency from address output from the 386 to data
being returned.  For example, after a jump, it will take an additional
clock to get the first instruction back to the processor, and the latency
from when a read address is presented til the read data comes back is also
an additional instruction.  Another consideration is that if you need the
extra time on the bus to get the cycle through externally, you will require
a 3 clock bus cycle when going from an idle bus, since the part can't guess
the next address when it is going to an idle bus, i.e., the first bus
cycle from an idle bus is necessarily non-pipelined.  The performance
difference between a no-wait-state non-pipelined bus and a no-wait-state
pipelined bus is somewhere in the range of adding 1/3 to 1/2 a wait
state, so you will make a faster system with a pipelined bus than you
would if you had to add a wait state (which is what you'd have to do if
you couldn't run a pipelined bus and still wanted to live with the same
speed memory system) but it still isn't as fast as a straight no-wait-
state, non-pipelined system.
-- 
The above views are personal.

I've seen the future, I can't afford it...

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{ hplabs|amdcad|qantel|pur-ee|scgvaxd|oliveb }!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com