[fa.laser-lovers] Dover speed

laser-lovers@uw-beaver (03/19/85)

From: jbn@ford-wdl1.arpa

     The Dover at Stanford is still front-ended by a Xerox Alto, the world's
first workstation.  The Alto was developed around 1974, at Xerox PARC; it
has a CPU with loadable microcode, a small hard disk, a large screen,
about 128KB of RAM, and a 3MB Ethernet interface.  Original cost was about
$20-30K each, but they were never sold, just used internally at Xerox and
some cooperating research institutions.  The Ethernet interface stole 
cycles from the main CPU; you could compute, do net I/O, or do disk I/O,
but not more than one at a time (sounds like a Mac, doesn't it).  So
as a front-end processor for a printer it was not a big winner.
     The Dovers at PARC may have more modern front ends; I'm not up on
present happenings at PARC, but by now they probably have replaced their
Alto engines.  This may explain the speed differential.

				John Nagle

laser-lovers@uw-beaver (03/19/85)

From: Brian Reid <reid@su-glacier.arpa>

There is no speed differential between Stanford's Dover and Xerox's
Dovers. They run essentially identical hardware and virtually identical
software. They run at the same speed. The only real difference is that
ours is in a public place where anybody with a stopwatch can come
measure its speed.

laser-lovers@uw-beaver (03/21/85)

From: ihnp4!fortune!redwood!rpw3@uw-beaver.arpa (Rob Warnock)

+---------------
| From: jbn@ford-wdl1.arpa (John Nagle)
|   [On a Xerox Alto...] The Ethernet interface stole 
| cycles from the main CPU; you could compute, do net I/O, or do disk I/O,
| but not more than one at a time (sounds like a Mac, doesn't it).  So
| as a front-end processor for a printer it was not a big winner.
+---------------

Now wait just a minute! As I understand it, the CPU cycle-stealing WAS
a big winner! The Alto's CPU (a microcoded TTL bit-slice engine) is fast
enough to keep the memory busy. Even with "separate" DMA controllers,
DMA cycles would have caused the CPU to go idle waiting for the memory.
[The same is true for a 68000, except during a multiply or divide.]

	"It might appear that sharing the processor in this way
	would result in a significant degradation in performance,
	particularly for low-priority tasks such as the emulator
	[i.e., the 'instruction set']. This is in fact not the case;
	THE MAJOR BOTTLENECK IN THE SYSTEM IS THE MEMORY. Since most
	computations can be overlapped with memory operation, the
	performance of the Alto compares favorably with other systems
	employing single-ported, non-interleaved memory at comparable
	I/O bandwidths." [Thacker, et. al., "Alto: A personal computer",
	Xerox PARC report CSL-79-11, 7-Aug-79, page 12. This paper also
	appears in Siewiorek, Bell, and Newell, "Computer Structures",
	2nd ed.]

[Note that most current micro-based computers have "single-ported"
memory, in the sense used here -- a common memory timing and control bus.
Any "multi-porting" is in fact done by time-multiplexing the memory.]

Context-switching of the micro-engine was fast, since there are actually
16 "micro-PC"s, or "tasks". Therefore, using the CPU for DMA control didn't
cost you anything in speed, as the memory was going to be busy for those
cycles anyway, and it gained a LOT in cost and flexibility of the controllers
(which had access to the CPU's microinstruction set).  The Alto's memory
bandwidth is 31.3 megabits/sec, a figure that is still respectable, if not
quite up to current standards.

It was not the hardware architecture per. se. that was was the bottleneck.
In the section on the printer controller, they note that the CPU had
enough performance to drive the printer directly from a full bitmap in
memory (the display had a faster pixel rate than the printer), but:

	"Unfortunately, this simple approach fails for two reasons:
	the Alto does not have enough memory to buffer a full page
	image (12 million bits), and the processor cannot execute
	BitBlt fast enough to generate a bitmap for a moderately
	complex page in one second. These two problems force changes
	in the image-generation algorithm." [Ibid, p.37]

So they had to go to a band-buffer approach. (Before yelling at them,
remember, the first Altos used 1Kx1 RAMs, and the last ones used only
16Kx1 RAMs. Had 64K RAMs existed, the printer controller would have
been a lot simpler, I'm sure.) The page-generation algorithm that
results makes heavy use of the ability of I/O devices to execute
CPU microcode directly, and the controller would have been much
more expensive without it.

We should probably move this to "net.arch", but let me note that the
referenced paper (above) is publicly available, and contains a LOT of
fascinating information (including register-transfer-level block diagrams)
on the Alto processor and its various controllers (display, disk, Ethernet,
and printer).


Rob Warnock
Systems Architecture Consultant

UUCP:	{ihnp4,ucbvax!dual}!fortune!redwood!rpw3
DDD:	(415)572-2607
USPS:	510 Trinidad Lane, Foster City, CA  94404


[[Editor's note:  Yes, indeed, the topic does seem to have wandered
from laser printers so the redirection seems warranted.		--Rick]]