[comp.arch] Bus Partitioning?

andrew@alice.UUCP (Andrew Hume) (02/01/90)

 o'dell is not kidding.
some parts of the long distance network use a crossbar switch to
switch fast (T3(=45Mbps) i think) lines. the biggest are approx
1Kx1K but there are only 2 or 3 of these: they are expensive.

panek@hp-and.HP.COM (Jon Panek) (02/02/90)

I think there might be an advantage in taking the inherently simpler
approach proposed in the basenote.  So far, most of the responses have
quickly extrapolated to the NxM cross-bar architecture.  While this is
obviously the most general-purpose and most flexible one, it also incurs
the highest implementation cost.

By having a single linear bus with CPUs and Memory distributed along it
in sections which can either be connected to the segments on either side
of it or not, the implementation aspect becomes much more tractable.  One
obvious result of this is that the scheduler must become much smarter;
assigning tightly-coupled tasks to physically proximate CPUs.
 
Rather than having single on/off switches to connect segments of busses,
perhaps a dedicated limited-function CPU could also straddle the boundaries
and serve as a message-transmitter/receiver across otherwise disconnected
segments of the bus.  It would grab bus cycles during dead time of the
main CPUs.  In this way, any CPU could talk with any other CPU, and the
only penalty would be longer latency for physiclly disparate boxes.

Any Master's candidates looking for a research topic???

Jon P
panek@hp-and.HP.COM

mo@flash.bellcore.com (Michael O'Dell) (02/02/90)

It has been brought to my attention that because of a previous posting,
some folks have concluded that I am now affiliated with Bellcore.
That is NOT the case beyond a courtesy most graciously extended to me
by the kind folks at Bellcore.

As you can see by the line above, Organization is something
I'm seldom accused of....

	-Mike O'Dell

-----------------------------
"I can barely speak for myself, much less anyone else..."

rex@mips.COM (Rex Di Bona) (02/03/90)

In article <6960003@hp-and.HP.COM> panek@hp-and.HP.COM (Jon Panek) writes:
>I think there might be an advantage in taking the inherently simpler
>approach proposed in the basenote.  So far, most of the responses have
>quickly extrapolated to the NxM cross-bar architecture.  While this is
>obviously the most general-purpose and most flexible one, it also incurs
>the highest implementation cost.

Quite true, there are other ways of connecting, such as perfect shuffle,
or its topological equivalents.
>
>By having a single linear bus with CPUs and Memory distributed along it
>in sections which can either be connected to the segments on either side
>of it or not, the implementation aspect becomes much more tractable.  One
>obvious result of this is that the scheduler must become much smarter;
>assigning tightly-coupled tasks to physically proximate CPUs.
> 
I have been working on a similar system (not here, but for my PhD
at The University of Sydney) and it is possible, there are some
problems (of course :-) If you are careful the bus can be
reduced to a single combinatorial circuit which is really nice.

>Rather than having single on/off switches to connect segments of busses,
>perhaps a dedicated limited-function CPU could also straddle the boundaries
>and serve as a message-transmitter/receiver across otherwise disconnected
>segments of the bus.  It would grab bus cycles during dead time of the
>main CPUs.  In this way, any CPU could talk with any other CPU, and the
>only penalty would be longer latency for physiclly disparate boxes.

In this case, why not try to improve the own/release times for the bus, so
that a CPU can talk to others by just grabbing the required segment(s) of
the bus.
If you are talking about having this limited function CPU do store and
forward then you end up with either "async cycles" which raises all the
problems with store and forward networks, acknowlegments, lost signals,
etc, etc, etc (see networking texts for a good list of these problems)
or with long (and I mean long) delays in completing a cycle.
In any case, you will want to eventually just make these interconnect
CPUs as powerful as the real CPU (why waste that board/system space, "we
can just run a small async job" is usually the first argument that will
be raised) and eventually you will end up with either the transputer
array/hypercube (only CPUs talking) or (and this one IS interesting)
a network of nodes (maybe hypercubed), but with each node being similar
in design to a Sequent type multi CPU backplaned machine.
>
>Any Master's candidates looking for a research topic???
>
>Jon P
>panek@hp-and.HP.COM
----
DISCLAIMER: this article concerns work that I have done at The
University of Sydney, Australia. It does NOT refer to any work
that I am doing at MIPS, and should not be taken as an indication
that MIPS is either involved, or not involved, in this area.
(I just wanted to make this clear).  Rex.
-- 
Rex di Bona		Penguin Lust is NOT immoral!
rex@mips.com		apply STD disclaimers here.

borrill@bus.Sun.COM (Paul Borrill) (02/06/90)

In article <1990Jan30.174807.14657@ncsuvx.ncsu.edu> aras@ecerl3.UUCP () writes:
>Has anyone in the group run across articles, research, etc, on
>partitioning of single or multiple buses to create independent bus
>segments? I am planning to work on this, to partition the buses on the
>fly, reflecting changes in the locality of data exchanges among the
>processes.
>T
>The idea is this:
>
>Given N processors all attached to a single bus. If several processes
>on processors have a high percentage of data communication among
>themselves (among these processors) why not:
>
>	- assign these processes to processors that are physically
>adjacent on the bus
>	- Partition the bus so that it turns into several bus "segments"
>each independent of each other
>	- For global communucation, use a second bus, or, if only
>segment to segment communication is necessary, combine two segments on
>the fly.
>
>Responses will be appreciated.
>
>
>Caglan M. Aras    aras@eceris.ncsu.edu| Experts know more and
>N. C State Univ.                      | more about less and
>ECE Dept. Robotics Lab                | less till they know
>Raleigh, NC 27695                     | everything about nothing!

The question you are asking has been a hot topic in the
Futurebus+ working group over the past year, and many of the
tough issues, including deadlock, hierarchical cache protocols
and address management have been included in the P896.1
Futurebus+ Specifications.

You can get an overview of the Futurebus+ family of standards 
(called "What is Futurebus+") from the VME International Trade
Association, phone (602) 951-8866. I tried publishing it on
here, but whoever moderates this category must have blocked it,
probably due to its size.

The P896.1 Spec has just been released from the Working Group:

IEEE P896.1: Futurebus+  Draft 8.2, Published February,
1990, by the IEEE Computer Society, 1730 Massachusetts
Avenue, N.W., Washington, D.C. 20036-1903. Call 1-800-CS-BOOKS.

The Futurebus Working Group currently has over 800 people on its
mailing list, and meets every other month for a week-long Workshop.
Over 100 companies are actively involved in the final stages of
its definition. This is a VERY LARGE IEEE activity. Anyone who is
seriously interested in the Futurebus+, or the related activites,
is strongly urged to consider attending the meetings. (As an IEEE
activity, all meetings are open to the public, however Working Group
rules require that you attend at least two of the past four meetings
in order to vote).

The Futurebus+ Mailings (which are about 1-1/2" thick, and are
issued every other month), are available from the Futurebus+
Executive Secretary, Anatol Kaganovich at (408) 991-2599.
A great deal of material on multiple segment buses has appeared
here in the past. Back copies of mailings can be obtained from
Lisa Granoien at the IEEE Computer Society (202) 371-0101

The next IEEE Futurebus+ Workshop will be held at the DoubleTree 
Hotel, Santa Clara, from March 12 through March 16, 1990. 
Information on meeting agendas, special events etc., can be 
obtained from the Futurebus+ Information center, at VITA, Phone 
602-951-8866.

Kind regards, Paul.

mshute@r4.uucp (Malcolm Shute) (02/08/90)

In article <6960003@hp-and.HP.COM> panek@hp-and.HP.COM (Jon Panek) writes:

>[...]  So far, most of the responses have
>quickly extrapolated to the NxM cross-bar architecture.  While this is
>obviously the most general-purpose and most flexible one, it also incurs
>the highest implementation cost.

But going further to the NxN case *can* allow a sudden halving in the
implementation cost over the NxM case (with M -> N )
(You can dispense with half the switches if
you treat each bus as being owned by a particular node).

>By having a single linear bus with CPUs and Memory distributed along it
>in sections which can either be connected to the segments on either side
>of it or not, the implementation aspect becomes much more tractable.

This sounds like a variation of a nearest-neighbour vector network:

               -O-O-O-O-O-O-O-O-O-O-O-O-O-O-O-

where "-O-" is a module containing all of the CPUs and memories
which share a common local bus.

And in article <AGLEW.90Jan31205148@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:

>One of the busses may be designated the "global" bus, listened to by all 
>processors.
>    Others might be allocated to connect groups of processors (and I/O
>controllers) as needed.  This reconfiguration would be done infrequently.
>    So, for example, if you have an I/O going on, allocate it a bus
>for the (long) duration of the I/O.
>    Or, if a group of programs appear to communicate heavily, allocate
>them a private bus.

This sounds like an implementation of a tree network.

Malcolm Shute.         (The AM Mollusc:   v_@_ )        Disclaimer: all

aglew@oberon.csg.uiuc.edu (Andy Glew) (02/13/90)

>And in article <AGLEW.90Jan31205148@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:
>
>>One of the busses may be designated the "global" bus, listened to by all 
>>processors.
>>    Others might be allocated to connect groups of processors (and I/O
>>controllers) as needed.  This reconfiguration would be done infrequently.
>>    So, for example, if you have an I/O going on, allocate it a bus
>>for the (long) duration of the I/O.
>>    Or, if a group of programs appear to communicate heavily, allocate
>>them a private bus.
>
>This sounds like an implementation of a tree network.
>
>Malcolm Shute.         (The AM Mollusc:   v_@_ )        Disclaimer: all


Tree is a fixed topology.  I mean a system where any components can
talk directly on a private link - just requiring a bit of setup.


Hmmm... this sounds a lot like a fiber-optic WAN that somebody from
IBM just presented.  Bandwidth limited by the speed of electronics
attached to the net.  Since aggregate bandwidth of optics >> bandwidth
of receiving electronics, you can have almost unlimited simultaneous
conversations.  Except that a receiver can only listen to one
frequency band at a time.  Senders can only send one frequency at a
time.  Senders can only send on a fixed frequency (tuneable lasers
expensive/impractical).  Receivers can only change frequency they are
listening to infrequently (tuneable receivers slow (currently physical
(piezoelectric) eventually electroacoustic).
    Modulo switching protocol difficulties, here's your complete crossbar.
What type of system are we going to put on this interconnect?

--
Andy Glew, aglew@uiuc.edu

ccplumb@lion.waterloo.edu (Colin Plumb) (02/13/90)

In article <AGLEW.90Jan31205148@dwarfs.csg.uiuc.edu> aglew@dwarfs.csg.uiuc.edu (Andy Glew) writes:
>One of the busses may be designated the "global" bus, listened to by all 
>processors.
>    Others might be allocated to connect groups of processors (and I/O
>controllers) as needed.  This reconfiguration would be done infrequently.
>    So, for example, if you have an I/O going on, allocate it a bus
>for the (long) duration of the I/O.
>    Or, if a group of programs appear to communicate heavily, allocate
>them a private bus.

There exists a commercial product that looks like this: The Cogent
Research (Beaverton, Oregon) XTM.  A box of processors has a parallel
broadcast bus, used for exchanging synchronisation information, and
4 32-way crossbar switches connecting the 20-Mbit/sec serial DMA
links of the transputers.  These get used for sending larger messages
(like file I/O) around.  The link controller is a dedicated processor
that also lives on the global broadcast bus.  It's a lot faster than
a LAN (I forget exactly, but channel setup times are a few microseconds),
but a lot slower than a processor bus.

So somebody thinks it's a good idea... :-)
-- 
	-Colin