[comp.sys.next] What about Transputers for NeXT?

avery@rana.usc.edu (Avery Wang) (05/14/91)

I tried posting this a few weeks ago but apparently the news software is 
unreliable:
----------------------------------------------------------------------------

It would be interesting to see a NeXT based on the not-yet existing transputer.
It seems to have a lot of promise, but is still vaporware.  Check this out:

Electronic Engineering Times April 22 1991

Transputer pumped up BY ROGER WOOLNOUGH

Bristol, England--SGS-Thomson took the next step in its global 32-bit
microprocessor strategy last week, unveiling further details and schedules
for the powerful H1 Transputer from its subsidiary Inmos Ltd.
SGS-Thomson--which has remained aloof from other silicon vendors in the
scramble to align with a major RISC architecture--will rely on the power,
flexibility and broad tool support of the new Transputer, dubbed T9000, to
claim a share of the emerging 32-bit embedded computing market.

But the strategy is far from a sure thing. Although the current Transputer
chips have found a following outside the U.S.--Inmos's technical director,
Ian R. Pearson, claims that some 240,000 units of the Transputer family
have been shipped, a figure that would mean the Transputer outsold all
other 32-bit RISC microprocessors last year--the architecture is still not
widely accepted. And the U.S., in particular, has been a disappointing
market for the processors. Further, the sheer scale of the design effort,
and the fact that Inmos is announcing the part with no working silicon in
hand, raise questions about the eventual manufacturability of the T9000.

Sampling of the new Transputer, and of its accompanying C104 packet-routing 
switch and C100 system protocol converter, will start in the third quarter
of 1991, with market availability in Ql of 1992. First silicon of the C101
link adapter will be seen in Q4 1991, with market availability in
mid-1992.  But, for now, what Inmos has to show is a series of spectacular
benchmarks based on simulations and a 1.3 million-transistor cache test
chip, built with the three-level metal, 1.0-micron CMOS process that will
be used to fabricate the T9000.  If there are questions about the T9000's
volume production, there can be none about SGS-Thomson's determination to
make the Transputer succeed.  As evidence of the parent company's
ambitions, teams of Inmos executives traveled the globe last week to
deliver presentations in London, New York, Tokyo and San Francisco.  And
this week, SGS-Thomson and Inmos are hosting "Transputing 91," at the
Hilton Hotel, in Sunnyvale, Calif., where hundreds of engineers will
receive in-depth technical briefings on the T9000.

This scenario is very different from when Inmos launched its first
Transputer in 1985. Then, Inmos was a financially strapped small company
that made SRAMs, so a worldwide extravaganza was out of the question. The
Transputer's message spread almost by word of mouth, impaired by the
market's limited understanding of multiprocessing techniques and further
hindered by the Transputer's unique Occam language.

The second-generation T9000 was conceived before SGS-Thomson
Microelectronics acquired Inmos two years ago.  Yet, full resources to go
ahead with development only became available after the merger. But in
August of last year, Ian Pearson pledged that Inmos would remedy the
Transputer's shortcomings in its next generation, then code-named Hl (see
Aug.  13, page 1).

Last week, Inmos added information to the promise, when Pearson and his
team detailed the 50-MHz, 2-million-transistor device, boasting 10-times
performance improvement while maintaining binary compatibility with its
predecessors, the T400 and T800. At last week's disclosures Inmos was
showing only subsystem-level silicon.

The sheer performance of the part--which peaks at 200 native Mips,
according to Inmos simulations--could be enough to attract attention from
the 32-bit market. But to achieve the performance, Inmos is resorting to a
three-level-metal, l-micron CMOS process using Tungsten plugs, yielding a
transistor density of 10,000 transistors/mm2 on a 180-mm2 die--a challenge
to even the most seasoned CMOS vendor.

(The only other triple-metal 32-bit microprocessor that is currently known
to be on the drawing board is Intel's 100MHz 486, which was presented in a
technical paper at this year's International Solid-State Circuits
Conference.)

Besides the global marketing push and the aggressive performance targets,
Inmos will be stepping up a program designed to remove earlier objections
to the Transputer, by supporting industry-standard software, including
mainstream programming languages such as C, C++ and Fortran/77, as well as
Inmos's own Occam and operating systems for both distributed computing and
real-time embedded processing. Inmos is cooperating with Chorus Systems,
on its microkernel-based version of Unix for distributed systems and Ready
Systems, on its VRTX real-time kernel for embedded applications.

The software vendors will be bringing their wares to a unique
architecture. Like its predecessors, the T9000 family is based on Inmos's
concept of cellular computing. Each powerful processor has its own small
on-chip memory and is connected to the outside world by four fast serial
links. Such processing elements are commonly used in massively parallel
research computers, but Inmos claims the architecture's main strength will
be in conventional embedded computing.

"With its combination of powerful computing and communications, the T9000
is opening up new fields for the microprocessor," said Philippe Geyres,
general manager of SGS-Thomson's Programmable Products Group.  "It is
going to be valuable for such applications as real-time processing and
switching of high-definition video and of broadband ISDN. Other new
applications will be in such areas as multimedia workstations, where there
is a need for both a high degree of communication capability and huge
computing power."

It is the Transputer's combination of computing and communications that
Inmos believes gives it an edge over competitors in these applications.
The company argues that traditional CISC and RISC microprocessors, by
emphasizing increased computing performance, do not offer the balanced
solution needed for many embedded applications. Inmos believes that by
placing equal emphasis on computing and system communications,
Transputer-based designs can span the whole range from single-processor
applications through various kinds of multiprocessing, to massively
parallel architectures.

Three main elements

To achieve this range, the T9000 family combines three main elements: the
T9000 Transputer itself; a line of communications chips that allows T9000s
to link with each other, with other types of Transputers, and with
conventional bus systems; and software support for development, systems
and applications, available both from Inmos and from third parties
throughout the world.

The T9000 Transputer chip integrates on a single chip the CPU; a 16-kbyte
local memory- the communications system, which includes its own
communications controller (called the virtual channel processor) and four
serial links; and several other support functions.

The most aggressive element of the T9000 design is the execution unit,
which uses a pipelined superscalar-like architecture to process as many as
40 instructions simultaneously, reaching a peak execution rate of 200 Mips
at 50 MHz. At the head of the pipeline is a 32-instruction fetch-ahead
buffer. From this buffer, a hardware scheduler, called an instruction
grouper, picks instructions and groups them together for dispatch. The
pipeline can accept one group -- up to eight instructions -- on each cycle.

The actual contents of an instruction group are constrained by the
pipeline design. The pipeline can execute two local cache references, two
address computations, two main cache references, an ALU operation, and a
write or jump operation on each cycle. So a group can contain, for
example, no more than two local cache references.

In real code, instructions will seldom come in just the right order to
provide for full groups.  What happens in practice, according to Inmos, is
that the fetch logic picks up four instructions on each cycle. The grouper
will combine these as best it can, actually creating groups of from one to
three instructions most of the time. When a multiple-cycle instruction --
say, a multiply -- comes along, that stalls the pipeline and gives the
grouper a chance to work ahead a little bit.

So while the theoretical upper limit for pipeline activity is 40
instructions -- one eight-instruction group in each of the pipeline stages
at once -- a more typical scenario would have about eight instructions in the
pipeline at any one time, according to Inmos.  That would give sustained
performance somewhere between 50 and 80 Mips, assuming not too many
multiple-cycle instructions come by.

The processor contains both an integer unit, which does primarily
single-cycle operations, and a multicycle FPU, which operates on 32-bit
and 64-bit floating-point numbers as specified by the IEEE 754 standard.
While the T9000 is upward binary compatible with previous Transputers, it
in particular implements the same instruction set as the T805, but with
many additions.

Inmos estimates that a T9000 running at 50 MHz can typically execute code
compiled for the T805 10 times faster than a 20MHz T805. But this higher
performance can be achieved without having to replace existing development
tools and software. Inmos claims that only a modest amount of work is
needed to modify compilers to produce code optimized for the T9000.

Inmos also said that it has overcome the disadvantages of some recent
implementations of pipelined and superscalar microprocessors, which
require careful programming to obtain the target performance. With the
T9000, details of the pipeline are transparent to the programmer, and the
processor appears to be the normal Transputer architecture.  In order to
support the processor's high execution rate, Inmos has devised an
extraordinary memory architecture for the T9000. The part has two separate
caches--a local one attached to the pipeline, which functions almost as an
extended register file, and a main cache.  The latter is actually the
Transputer's main memory, which is organized as 16 kbytes of fully
associative cache. The full associativity, aside from being unique in the
industry, permits the small memory to achieve upward of 98 percent hit
rates, according to the company. There are two buses between the pipeline
and local cache, and four between the pipeline and main memory, providing
enormous bandwidth to the processor.  

In line with the original Transputer concept, the T9000 also has a
complete communications subsystem on chip to support interprocessor
communications.  This subsystem includes four 100-Mbit/s, full-duplex,
serial communication links, each with its own pair of DMA channels.  The
links can be directly connected between Transputers, with no external
buffering or other glue logic. Each serial link has a packet-based link
protocol, supporting a data rate of 10 Mbytes/s. This gives the T9000 a
total bidirectional communications bandwidth of 80 Mbytes/s.  Though the
T9000 is primarily dependent on its on-chip memory for performance, the
part also has provision for substantial off-chip storage. It includes an
on-chip programmable-memory interface, which can be linked to memory
without glue logic. Providing an external memory bandwidth of 200
Mbytes/s, the memory interface can address up to 4 Gbytes of memory in
four independent banks. This allows EPROM and dynamic, static, and video
RAM to be addressed directly via 64-, 32-,16- or 8-bit data buses.

The T9000 communicates to external components under the control of a 5-MHz
clock, simplifying system design and retaming compatibility with previous
Transputers.

Supporting the CPU

The T9000 processor by itself is a formidable device, requiring 2 million
transistors. But to apply the part to the most demanding communications
and computing applications, Inmos realized the need for supporting chips.
The transputer alone, with its four links would quickly starve for
many multiprocessing topologies. And the T9000 by it self provides no
friendly way to connect to any of the buses or devices used by the rest of
the world.

So Inmos created a line of communications peripherals called the C1XX
family. These parts provide off-chip support for multiprocessing, and
allow any size of T9000 system to be built. They also permit connections
between first-generation and second-generation Transputers, and provide an
interface standard bus systems. Prelaunch details of the ClXX family were
reported last year (see Aug 13, 1990, page 1).
 
The C104 is a complete packet routing switch that acts like a single-chip
PBX. This connects 32 links to each other through a nonblocking crossbar
switch with submicrosecond latency. In this way, it emulates a direct
connection between each of the devices in a T9000 network, permitting any
multiprocessing topology the application wants.

If required, multiple C104s can be connected together to make larger
networks. Any number of T9000 Transputers can be linked in this manner.

In devising its technique for routing messages across a network, Inmos has
borrowed from packet-switched data communications, exploiting such
concepts as virtual channels, header addressing and wormhole routing. The
C104 chip was developed as part of a project called Puma, within the
Common Market's Esprit research program, and it is expected to find
applications in non-Transputer system communications as well.

The second C100-series chip addresses the fact that it would be handy to
use older T800 Transputer parts -- many of which were specialized for
particular I/O tasks and all of which were relatively economical with the
new T9000. The T9000 has a new communications protocol, but mixed
Transputer networks can be built using the C100 system protocol converter.
This allows the optimum combination of Transputers to be used, thereby
meeting the needs of processing power, communication bandwidth and system
cost.

The third communications chip is the C101 link adaptor, which provides a
parallel interface between a T9000 link and external systems, such as
buses, peripheral devices and other microprocessors.

The Software side

In the software support area, Inmos made it clear that it does not intend
to repeat the errors of the past. In the original Transputer introduction,
the only programming language supported was Occam, a parallel programming
language unique to the Transputer. But in recent years, Inmos has gathered
a large suite of more conventional tools around its architecture, and this
effort will directly benefit the T9000.  Since the T9000 has instruction
set compatibility with the first generation Transputer family, use can be
made of existing development and application software. In addition, Inmos
is supporting the T9000 with a selection of specially developed compilers
for industry-standard languages and through several tool-set enhancements.

Third-party software includes compilers for C++ (Glockenspiel) and Ada
(Alsys), and realtime kemels and operating systems for distributed Unix
(Chorus), VRTX32/T (Ready Systems) and C-Executive (JMI Software
Consultants).

This week, JMI (Springhouse, Pa.) will announce its port of the
C-Executive real-time kernel to the T9000. C-Executive is a preemptive,
real-time kernel that is modeled after Unix and is specifically designed
for C programming. The kernel is coded primarily in C, making it portable.
It currently runs on 20 microprocessors, including most of the major RISC
chips, as well as the PC. Many Unix programmers like it because it
provides an easy transition from Unix to embedded systems. The Transputer
version of C-Executive comes with an optional file system and a system
debugger.

Altogether, the combination of enormous processing power and familiar
tools could be sufficiently attractive to overcome old prejudices about
the Transputer. SGS-Thomson and Inmos certainly hope so. The companies see
major opportunities for the T9000 in many areas of the 32-bit
microprocessor market, where worldwide deliveries are expected to grow
from 20 million units this year to 67 million units by 1995.

Estimates suggest that, by then, embedded applications will account for 61
percent of the market, compared with 44 percent today. Office automation
applications will represent 47 percent of the total, with communications,
mu1timedia and military making up most of the balance in embedded uses.

The T9000 could also give a new dimension to application accelerators for
PCs and workstations. As an example of the possibilities, Inmos marketing
director Paul Strzelecki described an accelerator board with 16
interconnected T9000s, which would deliver 400 Mflops and 3,200 Mips.

Simon Loe, U.K -correspondent for Electronic World News, Ron Wilson,
Nicolas Mokhoff and Ray Weiss contributed to this story.
 

jim@ljkiraly.lerc.nasa.gov (L J "Jim" Kiraly) (05/15/91)

In article <32842@usc> avery@rana.usc.edu (Avery Wang) writes:
->I tried posting this a few weeks ago but apparently the news software is 
->unreliable:
->----------------------------------------------------------------------------
->
->It would be interesting to see a NeXT based on the not-yet existing transputer.
->It seems to have a lot of promise, but is still vaporware.  Check this out:
->
->Electronic Engineering Times April 22 1991
->
->Transputer pumped up BY ROGER WOOLNOUGH
->
->Bristol, England--SGS-Thomson took the next step in its global 32-bit
->microprocessor strategy last week, unveiling further details and schedules
->for the powerful H1 Transputer from its subsidiary Inmos Ltd.
->SGS-Thomson--which has remained aloof from other silicon vendors in the
->scramble to align with a major RISC architecture--will rely on the power,
->flexibility and broad tool support of the new Transputer, dubbed T9000, to
->claim a share of the emerging 32-bit embedded computing market.
-> etc.

We've used transputers for several of our research projects.  The architecture
and native programming language (OCCAM) of transputers are derived from a 
communicating sequential process paradigm, with any number of logical channels
representing the interconnection between otherwise asynchronous processes.  
There are no global variables which results in some special programming 
concerns- such as allowing specific channels or tagged messages to tell 
otherwise continuing processes when to stop.  Transputers are very powerful 
in that logical connectivity between communicating processes can be mapped 
into somewhat arbitrary (and easily expandable) arrays of processors- each
of which manages it's own set of high speed serial links. The architecture 
and OCCAM constructs are similar to the message-object schemes of Objective C
(although I think it would take quite a bit of work to map Objective C into
a transputer environment).

We have not found any really complete/tight operating system to work with our
transputers.  For most applications, we've developed OCCAM programs using the
INMOS development system- which requires users to manage a lot of their own
system-type stuff that is normally invisible to the user.  There have been
some OS implementations which actively seek out unused processors in an
arbitrary network and add them as they come "on-line", or delete them as they
go off-line (Pretty good for fault tolerance)- but there hasn't been a good
overall OS implementation that at least we have found.  Some of the problem
had to do with past implementations of transputers- which are supposedly
addressed with the new T9000's (things like message routing, and better
support for higher level languages).

Anyway, my point with all this is to say, me too!  If NeXT was interested
in porting objective-C and NeXT Step, a workstation based on same could offer
a number of performance advantages such as internal client server operation
with internal, individual processors; objects mapped onto individual
processors, program controlled internal processor re-configuration to suit
different workstation applications (file-server, graphics workstation, client
workstation etc.), and the ability of users to incrementally upgrade their
workstations by adding transputer modules as they could afford them.

A useful product along these lines needs a corporate sponsor to set the 
interface rules, design the user interface, and develop the core utilities.
IT would really be a hot workstation, and I for one would love to have one.
I do feel that this is really a big job and too big a NeXT Step for NeXT- 
but I guess I still like to dream about the possibilities.
--
___________________________________________________________________________
  Jim Kiraly  - jim@ljkiraly.lerc.nasa.gov  -  NASA Lewis Research Center
---------------------------------------------------------------------------

shirley@gothamcity.jsc.nasa.gov (Bill Shirley) (05/16/91)

I saw an article in a magazine (last month's Byte?) that had a transputer board w/
one processor and an OCCAM compiler for an <cringe> PC.  It was relatively inexpensive
(< $200 if I remember correctly).  The advertizement was aimed toward the hobbyist/ midnight
engineer type.  Something similar for a NeXT would be interesting.  It would exclude
the growing numbers of slabs out there, but that's what you expect when you buy a machine
w/o slots.

It seems to me that that company (I don't remember it's name either and, yes, I'm too 
lazy to go find that add again) should just get a NeXTbus developement package and go
to it.

It would be interesting to the researcher, hobbiest, developer. 

(Is this really lighting a fire under anyone, or is this just an excersize in purging
my thoughts? <rhetorical>)

     ____     ____       ____			Bill Shirley
    / ___|   / ___|     / ___|			bill@gothamcity.jsc.nasa.gov
   |_|      |_|ciences |_|			_______________________________
    _omputer     _      _			Opinions expressed are obtained|
   | |___    ___| |    | |___orporation		by a room full of immortal apes|
    \____|  |____/      \____|			with unbreakable typewriters.  |
  						~~~~~~~~~~~DISCLAIMER~~~~~~~~~~~