[comp.unix.amiga] Adding Symmetric Multiprocessing to Amiga UNI

scott@texnext.gac.edu (Scott Hess) (01/30/91)

In article <20708@hydra.gatech.EDU> ken@dali.gatech.edu (Ken Seefried iii) writes:
   Keith Packard (the X Performance Man at MIT) has indicated that the 
   best way to speed up an X server is to put it on a fast processer with
   good memory bandwidth and direct access to the frame buffer.  The
   worst way is to try and stick it at arms length on some form of
   co-processor.  Bottom line seems to be that if your concern is a fast
   X server, leave it on the '040.  If you offload disk, net and serial
   traffic from the main CPU, I think you'll see a big win all the way
   around.  'Course, I'd want to get some empirical data before I bet the
   farm on it...

doug@ctc.contel.com (Doug Whitehead) writes:
   My embellishment: 
   Hey why not use the 68030 exclusively for an X server!  Some folks 
   pay the cost of an AmigaUX just for an X server.  This would be a KILLER
   X window system!!!

crs@convex.cl.msu.edu (Charles Severance (System Manager)) writes:
   Just another idea for use of the spare '030 in the Amiga.  There is a lot
   of CPU wasted to run the bit-mapped graphics display when running
   X-Windows, open-look, etc.  The '030 could run dedicated display
   software and be sent messages about what is to be displayed.  
   (almost like having the '030 acting like an X-terminal in the same 
   box.) 

   Then the '040 could do the actual work.

I suspect that the best solution, that is, if you're considering this,
is to put the X server on the _040_.  Yes, I know, blasphemy.  But,
when it comes right down to it, an 030 running application code only
(no X server, in other words) could probably keep the 040 running X
fairly busy.  Put io processing and everything on the '040, too, and
the '030 can run unconstrained.  Look at the NextDimension (and many
other graphics boards for other computers) - an i860 is faster than
an '040 for most stuff (though I'm sure it wouldn't run Unix so
well :-).  Most applications written for graphical environments
(X is, arguably, a graphical environment) spend more time drawing
graphics than doing calculations.  Exceptions?  Mandelbrot sets,
ray tracing, etc - but those would probably be better run in the
background without any window system running (sigh, an '040 and an
'030 would be heaven).  For those types of programs, the 040 wouldn't
be able to keep the 030 running X busy (X is very bitmap-oriented,
so would fly through the _display_ of the bitmaps resulting from
the calculations, while the calculations wouldn't be helped at all
by X).

But gosh, the '040 running X alone, 030 running clients would
be _slick_.  Put the windowmanager on the 040 side (so it's close
to those events), and you've got yourself a great machine to
sell X with ("now, see how the windows fly across the screen
when you move them).  From the way it sounds, though, it would
probably be better to get the Amiga custom chips working the X.

I guess maybe you could do like some of us were thinking of on the
NeXTs - we'd get our upgrade board, and leave the current system
board in (well, with a couple modified traces).  Then, you'd have
two completely seperate motherboards, one with an '040, one with
an '030, though you'd not be able to run two monitors off of it.
How to connect them?  Thinwire ethernet . . . While I'll admit
that this is not the best solution, it's simple.  I don't know
if that would work for an Amiga with a coprocessor, unless the
coprocessor ignored system memory - something I would be surprised
at (would the boards be regular backplane boards or special slot
boards?  Makes a difference . . .).  You never know, though . . .
sounds fun either way . . . .
--
scott hess                      scott@gac.edu
Independent NeXT Developer	GAC Undergrad
<I still speak for nobody>
"Tried anarchy, once.  Found it had too many constraints . . ."
"Buy `Sweat 'n wit '2 Live Crew'`, a new weight loss program by
Richard Simmons . . ."

swarren@convex.com (Steve Warren) (01/31/91)

In article <SCOTT.91Jan29193042@texnext.gac.edu> scott@texnext.gac.edu (Scott Hess) writes:
                           [...]
>I suspect that the best solution, that is, if you're considering this,
>is to put the X server on the _040_.  Yes, I know, blasphemy.  But,
>when it comes right down to it, an 030 running application code only
>(no X server, in other words) could probably keep the 040 running X
>fairly busy.  Put io processing and everything on the '040, too, and
>the '030 can run unconstrained.  Look at the NextDimension (and many
                           [...]
This setup might lead to reduced throughput on the A3000.

I am assuming that an '040 board would include on-board memory designed
to function optimally with an '040.  In this case you would want the '040
to favor the coprocessor ram over the motherboard ram.  You would also
want the '040 to release the bus on the motherboard when it is operating
out of coprocessor ram.

Either processor could write to memory physically located next to the other
processor, but the other processor would of course be blocked from any
memory accesses until that bus was released.  In other words, as long as
the '040 is operating on coprocessor memory and the '030 is operating out
of motherboard memory, they can both run at full speed simultaneously.  As
soon as one processor crosses over into the other's memory, only one
processor can access memory at a time (which may not hurt the '040 performance
as much, with its larger cache, but it would slow the '030 down
significantly).

Since the "frame buffer" is in the 1-2 Mbytes of chip ram on the motherboard
(unless you are using an alternate display card), it makes since to put the
X-server application on the processor that is "closer", in terms of
performance, to this memory.  This does not mean that the '040 would not be
able to perform just as fast or faster than the '030 as an X-server.  The
problem is that the '030 would be stalled frequently in that configuration.

On the other hand, by putting the X-server on the '030 you would still have
a 32-bit port into the display memory.  The '030 would not perform as well
as the '040 but it would still do quite well.  And the '040 would not be
stalled very often because it would tend to work out of the coprocessor
memory whenever possible.

I/O processing should also be performed on the '030.  The SCSI controller is
on the motherboard, and it just makes sense to put all the housekeeping out
on the secondary processor.

It would be nice if some utilization of the custom chips were possible, since
they do have line drawing and other capabilities.


--
            _.
--Steve   ._||__      DISCLAIMER: All opinions are my own.
  Warren   v\ *|     ----------------------------------------------
             V       {uunet,sun}!convex!swarren; swarren@convex.com

jesup@cbmvax.commodore.com (Randell Jesup) (01/31/91)

In article <1991Jan30.181208.1256@convex.com> swarren@convex.com (Steve Warren) writes:
>I am assuming that an '040 board would include on-board memory designed
>to function optimally with an '040.  In this case you would want the '040
>to favor the coprocessor ram over the motherboard ram.  You would also
>want the '040 to release the bus on the motherboard when it is operating
>out of coprocessor ram.

	Even a simple '040 implementation would have similar effects due to
the large cache of the '040.  An '040 with external cache would be even
less affected, though probably still would be affected more than the local-
memory version.  An external cache might be faster having local '040 memory
if the '030 isn't being used.

>I/O processing should also be performed on the '030.  The SCSI controller is
>on the motherboard, and it just makes sense to put all the housekeeping out
>on the secondary processor.

	Yes, though this requires a fair bit of coding... (understatement
of the day).

>It would be nice if some utilization of the custom chips were possible, since
>they do have line drawing and other capabilities.

	For "simple" blits (scroll, etc) an '030/'040 is probably faster
than the existing blitter ('040 wouldn't be much faster than '030).  For
complex blits, longish lines, patterned fills, etc, the blitter is faster.
(Short lines and restricted-shape solid-filled polygons are faster on the
processor, even in some cases for 68000's, assuming there's nothing else
the CPU could be doing.)

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

nv90-mho@dront.nada.kth.se (Magnus Holmberg) (01/31/91)

In article <1991Jan30.181208.1256@convex.com> swarren@convex.com (Steve Warren) writes:
>In article <SCOTT.91Jan29193042@texnext.gac.edu> scott@texnext.gac.edu (Scott Hess) writes:
>                           [...]
>>I suspect that the best solution, that is, if you're considering this,
>>is to put the X server on the _040_.  Yes, I know, blasphemy.  But,
>>when it comes right down to it, an 030 running application code only
>>(no X server, in other words) could probably keep the 040 running X
>>fairly busy.  Put io processing and everything on the '040, too, and
>>the '030 can run unconstrained.  Look at the NextDimension (and many
>
                           [...]
>
>Either processor could write to memory physically located next to the other
>processor, but the other processor would of course be blocked from any
>memory accesses until that bus was released.  In other words, as long as
>the '040 is operating on coprocessor memory and the '030 is operating out
>of motherboard memory, they can both run at full speed simultaneously.  As
>soon as one processor crosses over into the other's memory, only one
>processor can access memory at a time (which may not hurt the '040 performance
>as much, with its larger cache, but it would slow the '030 down
>significantly).
>
			[...]
>
>On the other hand, by putting the X-server on the '030 you would still have
>a 32-bit port into the display memory.  The '030 would not perform as well
>as the '040 but it would still do quite well.  And the '040 would not be
>stalled very often because it would tend to work out of the coprocessor
>memory whenever possible.
>
			[...]
	
	I'm not very good at hardware (esp. w/ the 3000). It seems
	to me, though, that since the '040 wouldn't need to access 
	Chipmem much, in a setup like this, it might be possible to
	run it on the same 'half-cycle' as the PAD. That way, the
	CPU's wouldn't have to lock eachother out of the Fastmem-bus.

	(Please, correct me, if the processor isn't running at every 
	other cycle, anymore, in the 3000. I _said_ I didn't know 
	much about hardware...)


		    -MH		

daveh@cbmvax.commodore.com (Dave Haynie) (02/01/91)

In article <1991Jan31.154251.16107@nada.kth.se> nv90-mho@dront.nada.kth.se (Magnus Holmberg) writes:

>	I'm not very good at hardware (esp. w/ the 3000). It seems
>	to me, though, that since the '040 wouldn't need to access 
>	Chipmem much, in a setup like this, it might be possible to
>	run it on the same 'half-cycle' as the PAD. That way, the
>	CPU's wouldn't have to lock eachother out of the Fastmem-bus.

The 32 bit bus is fully loaded by any '030 or '040 access; there's no spare
time like there was on the 7.16MHz 68000 systems.  Think of it this way --
the 68030's clock is 3.49 times that of the Amiga 68000 machines, and a
68030's fastest cycle is twice the speed, in clock cycles, of the 68000's
fastest cycle.  So, in order to run double cycles without wait states on an
A3000, you would need a DRAM chip that cycles in roughly 20ns (that's a
simplification, but you get the main idea).  The 80ns parts in the A3000
are cycled at 200ns, 10 times too slow for such a trick.

The best strategy for A3000 multiprocessing would be to have the 68040 off
the main (shared) bus as often as possible, while the 68030 does whatever
I/O operations it can handle.  The large 68040 cache makes this a somewhat
natural event anyway, but either private 68040 memory or a large external
cache on an '040 coprocessor board would improve this considerably.

>		    -MH		


-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"What works for me might work for you"	-Jimmy Buffett

jmarvin@oracle.oracle.com (John W. Marvin) (02/01/91)

AT&T is already working on a parallel version of V.4, building
on technology from Sequent and Mach, I've read.  C= should, IMHO,
just work on getting V.4 to work right and let AT&T (or is it UI?)
solve the parallel problems.  Is the future of Sys V in micro-kernals?
Stay tuned!


*******************************************************************
* John W. S. Marvin             * There are times when the wolves *
* Oracle Multimedia Development * are silent, and the moon is     *
* jmarvin@oracle.com            * howling...                      *
*******************************************************************

rblewitt@sdcc6.ucsd.edu (Richard Blewitt) (02/02/91)

In article <18407@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>
>The best strategy for A3000 multiprocessing would be to have the 68040 off
>the main (shared) bus as often as possible, while the 68030 does whatever
>I/O operations it can handle.  The large 68040 cache makes this a somewhat
>natural event anyway, but either private 68040 memory or a large external
>cache on an '040 coprocessor board would improve this considerably.

So Dave, just when will you be done with this board.  I know,
non-disclosure :(  So finish it quick so it can be released, and we
can get it.

Rick

david@twg.com (David S. Herron) (02/15/91)

In article <1991Feb1.021357.15863@oracle.com> jmarvin@oracle.com (John W. Marvin) writes:
>AT&T is already working on a parallel version of V.4, building
>on technology from Sequent and Mach, I've read.

Yes, this was stated here a few days ago by a person at Sequent.
This will be good .. in a former job I sysadminn'd a Sequent and
the multiprocessing support they have is really really good.  (Their
(Sequent's) networking software wasn't so hot, but then you can't
have everything, eh?)

The downside is that the multiprocessing System V isn't supposed to be
"out" until 1992.  Does this really mean 1993?  Sigh...

>  C= should, IMHO,
>just work on getting V.4 to work right and let AT&T (or is it UI?)
>solve the parallel problems.  Is the future of Sys V in micro-kernals?
>Stay tuned!

You're right.. converting Unix to multi-processing isn't easy.  There's
all those references to "u", and all those algorithms that don't lock
their data structures, etc etc ...

-- 
<- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>
<- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
<-
<-	MS-DOS ... The ultimate computer virus.