[net.arch] EA orthogonality

brooks@lll-crg.ARPA (Eugene D. Brooks III) (06/09/85)

> And furthermore, the orthogonal sequence is normally atomic;
> in an OS kernel the non-orthogonal sequence might easily have to
> be protected by a "disable/enable interrupt" sequence around it,
> or "test-and-set" or some such in a multi-processor system 
> (e.g., "a" and "b" might be global vars).
> Multi-process user-programs would need "enter/exit monitor" or
> "block-on-semaphore" sequences.  Besides being a pain (sometimes
> a royal pain) this has the potential for eating a lot of CPU time.
> -- 
Considerations for multiprocessing are one of the strongest arguments
in favor of a load/store type of instruction set.  The fundamental problem
to be overcome in a multiprocessor is memory latency.  You increase efficiency
in an environment with high memory latency by using a load/store type of
instruction set in conjunction with a processor composed of pipelined functional
units and careful instruction ordering.  For example:

a += b;

load r0,_a
load r1,_b
add r0,r1
store r0,_a

The performance gain is achieved with there is more work to do. For example:

a += b;
c += d;

load r0,_a
load r1,_b
load r2,_c
load r3,_d
add r0,r1
add r2,r3
store r0,_a
store r2,_c

The loads overlap their latencies resulting in a higher performance than is
capable with the sequence

add _a,_b
add _c,_d