avery@rana.usc.edu (Avery Wang) (05/14/91)
I tried posting this a few weeks ago but apparently the news software is unreliable: ---------------------------------------------------------------------------- It would be interesting to see a NeXT based on the not-yet existing transputer. It seems to have a lot of promise, but is still vaporware. Check this out: Electronic Engineering Times April 22 1991 Transputer pumped up BY ROGER WOOLNOUGH Bristol, England--SGS-Thomson took the next step in its global 32-bit microprocessor strategy last week, unveiling further details and schedules for the powerful H1 Transputer from its subsidiary Inmos Ltd. SGS-Thomson--which has remained aloof from other silicon vendors in the scramble to align with a major RISC architecture--will rely on the power, flexibility and broad tool support of the new Transputer, dubbed T9000, to claim a share of the emerging 32-bit embedded computing market. But the strategy is far from a sure thing. Although the current Transputer chips have found a following outside the U.S.--Inmos's technical director, Ian R. Pearson, claims that some 240,000 units of the Transputer family have been shipped, a figure that would mean the Transputer outsold all other 32-bit RISC microprocessors last year--the architecture is still not widely accepted. And the U.S., in particular, has been a disappointing market for the processors. Further, the sheer scale of the design effort, and the fact that Inmos is announcing the part with no working silicon in hand, raise questions about the eventual manufacturability of the T9000. Sampling of the new Transputer, and of its accompanying C104 packet-routing switch and C100 system protocol converter, will start in the third quarter of 1991, with market availability in Ql of 1992. First silicon of the C101 link adapter will be seen in Q4 1991, with market availability in mid-1992. But, for now, what Inmos has to show is a series of spectacular benchmarks based on simulations and a 1.3 million-transistor cache test chip, built with the three-level metal, 1.0-micron CMOS process that will be used to fabricate the T9000. If there are questions about the T9000's volume production, there can be none about SGS-Thomson's determination to make the Transputer succeed. As evidence of the parent company's ambitions, teams of Inmos executives traveled the globe last week to deliver presentations in London, New York, Tokyo and San Francisco. And this week, SGS-Thomson and Inmos are hosting "Transputing 91," at the Hilton Hotel, in Sunnyvale, Calif., where hundreds of engineers will receive in-depth technical briefings on the T9000. This scenario is very different from when Inmos launched its first Transputer in 1985. Then, Inmos was a financially strapped small company that made SRAMs, so a worldwide extravaganza was out of the question. The Transputer's message spread almost by word of mouth, impaired by the market's limited understanding of multiprocessing techniques and further hindered by the Transputer's unique Occam language. The second-generation T9000 was conceived before SGS-Thomson Microelectronics acquired Inmos two years ago. Yet, full resources to go ahead with development only became available after the merger. But in August of last year, Ian Pearson pledged that Inmos would remedy the Transputer's shortcomings in its next generation, then code-named Hl (see Aug. 13, page 1). Last week, Inmos added information to the promise, when Pearson and his team detailed the 50-MHz, 2-million-transistor device, boasting 10-times performance improvement while maintaining binary compatibility with its predecessors, the T400 and T800. At last week's disclosures Inmos was showing only subsystem-level silicon. The sheer performance of the part--which peaks at 200 native Mips, according to Inmos simulations--could be enough to attract attention from the 32-bit market. But to achieve the performance, Inmos is resorting to a three-level-metal, l-micron CMOS process using Tungsten plugs, yielding a transistor density of 10,000 transistors/mm2 on a 180-mm2 die--a challenge to even the most seasoned CMOS vendor. (The only other triple-metal 32-bit microprocessor that is currently known to be on the drawing board is Intel's 100MHz 486, which was presented in a technical paper at this year's International Solid-State Circuits Conference.) Besides the global marketing push and the aggressive performance targets, Inmos will be stepping up a program designed to remove earlier objections to the Transputer, by supporting industry-standard software, including mainstream programming languages such as C, C++ and Fortran/77, as well as Inmos's own Occam and operating systems for both distributed computing and real-time embedded processing. Inmos is cooperating with Chorus Systems, on its microkernel-based version of Unix for distributed systems and Ready Systems, on its VRTX real-time kernel for embedded applications. The software vendors will be bringing their wares to a unique architecture. Like its predecessors, the T9000 family is based on Inmos's concept of cellular computing. Each powerful processor has its own small on-chip memory and is connected to the outside world by four fast serial links. Such processing elements are commonly used in massively parallel research computers, but Inmos claims the architecture's main strength will be in conventional embedded computing. "With its combination of powerful computing and communications, the T9000 is opening up new fields for the microprocessor," said Philippe Geyres, general manager of SGS-Thomson's Programmable Products Group. "It is going to be valuable for such applications as real-time processing and switching of high-definition video and of broadband ISDN. Other new applications will be in such areas as multimedia workstations, where there is a need for both a high degree of communication capability and huge computing power." It is the Transputer's combination of computing and communications that Inmos believes gives it an edge over competitors in these applications. The company argues that traditional CISC and RISC microprocessors, by emphasizing increased computing performance, do not offer the balanced solution needed for many embedded applications. Inmos believes that by placing equal emphasis on computing and system communications, Transputer-based designs can span the whole range from single-processor applications through various kinds of multiprocessing, to massively parallel architectures. Three main elements To achieve this range, the T9000 family combines three main elements: the T9000 Transputer itself; a line of communications chips that allows T9000s to link with each other, with other types of Transputers, and with conventional bus systems; and software support for development, systems and applications, available both from Inmos and from third parties throughout the world. The T9000 Transputer chip integrates on a single chip the CPU; a 16-kbyte local memory- the communications system, which includes its own communications controller (called the virtual channel processor) and four serial links; and several other support functions. The most aggressive element of the T9000 design is the execution unit, which uses a pipelined superscalar-like architecture to process as many as 40 instructions simultaneously, reaching a peak execution rate of 200 Mips at 50 MHz. At the head of the pipeline is a 32-instruction fetch-ahead buffer. From this buffer, a hardware scheduler, called an instruction grouper, picks instructions and groups them together for dispatch. The pipeline can accept one group -- up to eight instructions -- on each cycle. The actual contents of an instruction group are constrained by the pipeline design. The pipeline can execute two local cache references, two address computations, two main cache references, an ALU operation, and a write or jump operation on each cycle. So a group can contain, for example, no more than two local cache references. In real code, instructions will seldom come in just the right order to provide for full groups. What happens in practice, according to Inmos, is that the fetch logic picks up four instructions on each cycle. The grouper will combine these as best it can, actually creating groups of from one to three instructions most of the time. When a multiple-cycle instruction -- say, a multiply -- comes along, that stalls the pipeline and gives the grouper a chance to work ahead a little bit. So while the theoretical upper limit for pipeline activity is 40 instructions -- one eight-instruction group in each of the pipeline stages at once -- a more typical scenario would have about eight instructions in the pipeline at any one time, according to Inmos. That would give sustained performance somewhere between 50 and 80 Mips, assuming not too many multiple-cycle instructions come by. The processor contains both an integer unit, which does primarily single-cycle operations, and a multicycle FPU, which operates on 32-bit and 64-bit floating-point numbers as specified by the IEEE 754 standard. While the T9000 is upward binary compatible with previous Transputers, it in particular implements the same instruction set as the T805, but with many additions. Inmos estimates that a T9000 running at 50 MHz can typically execute code compiled for the T805 10 times faster than a 20MHz T805. But this higher performance can be achieved without having to replace existing development tools and software. Inmos claims that only a modest amount of work is needed to modify compilers to produce code optimized for the T9000. Inmos also said that it has overcome the disadvantages of some recent implementations of pipelined and superscalar microprocessors, which require careful programming to obtain the target performance. With the T9000, details of the pipeline are transparent to the programmer, and the processor appears to be the normal Transputer architecture. In order to support the processor's high execution rate, Inmos has devised an extraordinary memory architecture for the T9000. The part has two separate caches--a local one attached to the pipeline, which functions almost as an extended register file, and a main cache. The latter is actually the Transputer's main memory, which is organized as 16 kbytes of fully associative cache. The full associativity, aside from being unique in the industry, permits the small memory to achieve upward of 98 percent hit rates, according to the company. There are two buses between the pipeline and local cache, and four between the pipeline and main memory, providing enormous bandwidth to the processor. In line with the original Transputer concept, the T9000 also has a complete communications subsystem on chip to support interprocessor communications. This subsystem includes four 100-Mbit/s, full-duplex, serial communication links, each with its own pair of DMA channels. The links can be directly connected between Transputers, with no external buffering or other glue logic. Each serial link has a packet-based link protocol, supporting a data rate of 10 Mbytes/s. This gives the T9000 a total bidirectional communications bandwidth of 80 Mbytes/s. Though the T9000 is primarily dependent on its on-chip memory for performance, the part also has provision for substantial off-chip storage. It includes an on-chip programmable-memory interface, which can be linked to memory without glue logic. Providing an external memory bandwidth of 200 Mbytes/s, the memory interface can address up to 4 Gbytes of memory in four independent banks. This allows EPROM and dynamic, static, and video RAM to be addressed directly via 64-, 32-,16- or 8-bit data buses. The T9000 communicates to external components under the control of a 5-MHz clock, simplifying system design and retaming compatibility with previous Transputers. Supporting the CPU The T9000 processor by itself is a formidable device, requiring 2 million transistors. But to apply the part to the most demanding communications and computing applications, Inmos realized the need for supporting chips. The transputer alone, with its four links would quickly starve for many multiprocessing topologies. And the T9000 by it self provides no friendly way to connect to any of the buses or devices used by the rest of the world. So Inmos created a line of communications peripherals called the C1XX family. These parts provide off-chip support for multiprocessing, and allow any size of T9000 system to be built. They also permit connections between first-generation and second-generation Transputers, and provide an interface standard bus systems. Prelaunch details of the ClXX family were reported last year (see Aug 13, 1990, page 1). The C104 is a complete packet routing switch that acts like a single-chip PBX. This connects 32 links to each other through a nonblocking crossbar switch with submicrosecond latency. In this way, it emulates a direct connection between each of the devices in a T9000 network, permitting any multiprocessing topology the application wants. If required, multiple C104s can be connected together to make larger networks. Any number of T9000 Transputers can be linked in this manner. In devising its technique for routing messages across a network, Inmos has borrowed from packet-switched data communications, exploiting such concepts as virtual channels, header addressing and wormhole routing. The C104 chip was developed as part of a project called Puma, within the Common Market's Esprit research program, and it is expected to find applications in non-Transputer system communications as well. The second C100-series chip addresses the fact that it would be handy to use older T800 Transputer parts -- many of which were specialized for particular I/O tasks and all of which were relatively economical with the new T9000. The T9000 has a new communications protocol, but mixed Transputer networks can be built using the C100 system protocol converter. This allows the optimum combination of Transputers to be used, thereby meeting the needs of processing power, communication bandwidth and system cost. The third communications chip is the C101 link adaptor, which provides a parallel interface between a T9000 link and external systems, such as buses, peripheral devices and other microprocessors. The Software side In the software support area, Inmos made it clear that it does not intend to repeat the errors of the past. In the original Transputer introduction, the only programming language supported was Occam, a parallel programming language unique to the Transputer. But in recent years, Inmos has gathered a large suite of more conventional tools around its architecture, and this effort will directly benefit the T9000. Since the T9000 has instruction set compatibility with the first generation Transputer family, use can be made of existing development and application software. In addition, Inmos is supporting the T9000 with a selection of specially developed compilers for industry-standard languages and through several tool-set enhancements. Third-party software includes compilers for C++ (Glockenspiel) and Ada (Alsys), and realtime kemels and operating systems for distributed Unix (Chorus), VRTX32/T (Ready Systems) and C-Executive (JMI Software Consultants). This week, JMI (Springhouse, Pa.) will announce its port of the C-Executive real-time kernel to the T9000. C-Executive is a preemptive, real-time kernel that is modeled after Unix and is specifically designed for C programming. The kernel is coded primarily in C, making it portable. It currently runs on 20 microprocessors, including most of the major RISC chips, as well as the PC. Many Unix programmers like it because it provides an easy transition from Unix to embedded systems. The Transputer version of C-Executive comes with an optional file system and a system debugger. Altogether, the combination of enormous processing power and familiar tools could be sufficiently attractive to overcome old prejudices about the Transputer. SGS-Thomson and Inmos certainly hope so. The companies see major opportunities for the T9000 in many areas of the 32-bit microprocessor market, where worldwide deliveries are expected to grow from 20 million units this year to 67 million units by 1995. Estimates suggest that, by then, embedded applications will account for 61 percent of the market, compared with 44 percent today. Office automation applications will represent 47 percent of the total, with communications, mu1timedia and military making up most of the balance in embedded uses. The T9000 could also give a new dimension to application accelerators for PCs and workstations. As an example of the possibilities, Inmos marketing director Paul Strzelecki described an accelerator board with 16 interconnected T9000s, which would deliver 400 Mflops and 3,200 Mips. Simon Loe, U.K -correspondent for Electronic World News, Ron Wilson, Nicolas Mokhoff and Ray Weiss contributed to this story.
jim@ljkiraly.lerc.nasa.gov (L J "Jim" Kiraly) (05/15/91)
In article <32842@usc> avery@rana.usc.edu (Avery Wang) writes:
->I tried posting this a few weeks ago but apparently the news software is
->unreliable:
->----------------------------------------------------------------------------
->
->It would be interesting to see a NeXT based on the not-yet existing transputer.
->It seems to have a lot of promise, but is still vaporware. Check this out:
->
->Electronic Engineering Times April 22 1991
->
->Transputer pumped up BY ROGER WOOLNOUGH
->
->Bristol, England--SGS-Thomson took the next step in its global 32-bit
->microprocessor strategy last week, unveiling further details and schedules
->for the powerful H1 Transputer from its subsidiary Inmos Ltd.
->SGS-Thomson--which has remained aloof from other silicon vendors in the
->scramble to align with a major RISC architecture--will rely on the power,
->flexibility and broad tool support of the new Transputer, dubbed T9000, to
->claim a share of the emerging 32-bit embedded computing market.
-> etc.
We've used transputers for several of our research projects. The architecture
and native programming language (OCCAM) of transputers are derived from a
communicating sequential process paradigm, with any number of logical channels
representing the interconnection between otherwise asynchronous processes.
There are no global variables which results in some special programming
concerns- such as allowing specific channels or tagged messages to tell
otherwise continuing processes when to stop. Transputers are very powerful
in that logical connectivity between communicating processes can be mapped
into somewhat arbitrary (and easily expandable) arrays of processors- each
of which manages it's own set of high speed serial links. The architecture
and OCCAM constructs are similar to the message-object schemes of Objective C
(although I think it would take quite a bit of work to map Objective C into
a transputer environment).
We have not found any really complete/tight operating system to work with our
transputers. For most applications, we've developed OCCAM programs using the
INMOS development system- which requires users to manage a lot of their own
system-type stuff that is normally invisible to the user. There have been
some OS implementations which actively seek out unused processors in an
arbitrary network and add them as they come "on-line", or delete them as they
go off-line (Pretty good for fault tolerance)- but there hasn't been a good
overall OS implementation that at least we have found. Some of the problem
had to do with past implementations of transputers- which are supposedly
addressed with the new T9000's (things like message routing, and better
support for higher level languages).
Anyway, my point with all this is to say, me too! If NeXT was interested
in porting objective-C and NeXT Step, a workstation based on same could offer
a number of performance advantages such as internal client server operation
with internal, individual processors; objects mapped onto individual
processors, program controlled internal processor re-configuration to suit
different workstation applications (file-server, graphics workstation, client
workstation etc.), and the ability of users to incrementally upgrade their
workstations by adding transputer modules as they could afford them.
A useful product along these lines needs a corporate sponsor to set the
interface rules, design the user interface, and develop the core utilities.
IT would really be a hot workstation, and I for one would love to have one.
I do feel that this is really a big job and too big a NeXT Step for NeXT-
but I guess I still like to dream about the possibilities.
--
___________________________________________________________________________
Jim Kiraly - jim@ljkiraly.lerc.nasa.gov - NASA Lewis Research Center
---------------------------------------------------------------------------
shirley@gothamcity.jsc.nasa.gov (Bill Shirley) (05/16/91)
I saw an article in a magazine (last month's Byte?) that had a transputer board w/ one processor and an OCCAM compiler for an <cringe> PC. It was relatively inexpensive (< $200 if I remember correctly). The advertizement was aimed toward the hobbyist/ midnight engineer type. Something similar for a NeXT would be interesting. It would exclude the growing numbers of slabs out there, but that's what you expect when you buy a machine w/o slots. It seems to me that that company (I don't remember it's name either and, yes, I'm too lazy to go find that add again) should just get a NeXTbus developement package and go to it. It would be interesting to the researcher, hobbiest, developer. (Is this really lighting a fire under anyone, or is this just an excersize in purging my thoughts? <rhetorical>) ____ ____ ____ Bill Shirley / ___| / ___| / ___| bill@gothamcity.jsc.nasa.gov |_| |_|ciences |_| _______________________________ _omputer _ _ Opinions expressed are obtained| | |___ ___| | | |___orporation by a room full of immortal apes| \____| |____/ \____| with unbreakable typewriters. | ~~~~~~~~~~~DISCLAIMER~~~~~~~~~~~