[comp.arch] multiplexing jobs on a fast cpu to hide memory latency

freudent@eric.nyu.edu (Eric Freudenthal) (11/16/89)

Tom Shott wrote: (Re: RISC vs CISC (rational discussion, not religious wars))
 A novel architecture from the Computer Systems Group at UIUC published by
 Dave Archer, et el used multiple task running on one CPU to hide delays.
 For example w/ a 4 stage pipeline, the CPU chip would run four tasks at
 once. I don't remember the details but it worked out that each task
 executed at 1/4 of full speed. (I think dummy pipeline stages were used
 between the stages). But during that delay time memory fetch latency was
 hidden. (Also data dependices). Realistically I might expect this technique
 only to be used for large systems aimed at multiuser applications. You need
 four tasks always ready to run.

This idea is not new.  It was popular when memory was MUCH slower and
cheaper than cpus.  Other instruction-level task-switching techniques
have been used recently in machines with long pe-memory latency such
as the HEP multiprocessor.

There are a couple of reasons that many people don't like this idea
nowadays:

1) cost: assume that the 4x multiplexed cpu is running at
state-of-the-art speeds.  It will probably cost more than 4 cpus with
1/4 speed on the same bus (interconnection port) and have the same
performance.

2) context state: unless all 4 jobs are safe to run together in the
same context, the context memory will need to be duplicated...not to
mention the cache.  Since cache and addr. translation circuit design
are difficult and expensive without multiplexing (often a substantial
portion of the entire cpu cost), total system complexity must also
increase.

--
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
				Eric Freudenthal
				NYU Ultracompter Lab
				715 Broadway, 10th floor
				New York, NY  10012
				
				Phone:(212) 998-3345 work
				      (718) 789-4486 home
				Email:freudent@ultra.nyu.edu