andyr@inmos.com (Andy Rabagliati) (04/27/91)
Electronic Engineering Times April 22 1991 Transputer pumped up BY ROGER WOOLNOUGH Bristol, England--SGS-Thomson took the next step in its global 32-bit microprocessor strategy last week, unveiling further details and schedules for the powerful H1 Transputer from its subsidiary Inmos Ltd. SGS-Thomson--which has remained aloof from other silicon vendors in the scramble to align with a major RISC architecture--will rely on the power, flexibility and broad tool support of the new Transputer, dubbed T9000, to claim a share of the emerging 32-bit embedded computing market. But the strategy is far from a sure thing. Although the current Transputer chips have found a following outside the U.S.--Inmos's technical director, Ian R. Pearson, claims that some 240,000 units of the Transputer family have been shipped, a figure that would mean the Transputer outsold all other 32-bit RISC microprocessors last year--the architecture is still not widely accepted. And the U.S., in particular, has been a disappointing market for the processors. Further, the sheer scale of the design effort, and the fact that Inmos is announcing the part with no working silicon in hand, raise questions about the eventual manufacturability of the T9000. Sampling of the new Transputer, and of its accompanying C104 packet-routing switch and C100 system protocol converter, will start in the third quarter of 1991, with market availability in Ql of 1992. First silicon of the C101 link adapter will be seen in Q4 1991, with market availability in mid-1992. But, for now, what Inmos has to show is a series of spectacular benchmarks based on simulations and a 1.3 million-transistor cache test chip, built with the three-level metal, 1.0-micron CMOS process that will be used to fabricate the T9000. If there are questions about the T9000's volume production, there can be none about SGS-Thomson's determination to make the Transputer succeed. As evidence of the parent company's ambitions, teams of Inmos executives traveled the globe last week to deliver presentations in London, New York, Tokyo and San Francisco. And this week, SGS-Thomson and Inmos are hosting "Transputing 91," at the Hilton Hotel, in Sunnyvale, Calif., where hundreds of engineers will receive in-depth technical briefings on the T9000. This scenario is very different from when Inmos launched its first Transputer in 1985. Then, Inmos was a financially strapped small company that made SRAMs, so a worldwide extravaganza was out of the question. The Transputer's message spread almost by word of mouth, impaired by the market's limited understanding of multiprocessing techniques and further hindered by the Transputer's unique Occam language. The second-generation T9000 was conceived before SGS-Thomson Microelectronics acquired Inmos two years ago. Yet, full resources to go ahead with development only became available after the merger. But in August of last year, Ian Pearson pledged that Inmos would remedy the Transputer's shortcomings in its next generation, then code-named Hl (see Aug. 13, page 1). Last week, Inmos added information to the promise, when Pearson and his team detailed the 50-MHz, 2-miUion-transistor device, boasting 10-times performance improvement while maintaining binary compatibility with its predecessors, the T400 and T800. At last week's disclosures Inmos was showing only subsystem-level silicon. The sheer performance of the part--which peaks at 200 native Mips, according to Inmos simulations--could be enough to attract attention from the 32-bit market. But to achieve the performance, Inmos is resorting to a three-level-metal, l-micron CMOS process using Tungsten plugs, yielding a transistor density of 10,000 transistors/mm2 on a 180-mm2 die--a challenge to even the most seasoned CMOS vendor. (The only other triple-metal 32-bit microprocessor that is currently known to be on the drawing board is Intel's 100MHz 486, which was presented in a technical paper at this year's International Solid-State Circuits Conference.) Besides the global marketing push and the aggressive performance targets, Inmos will be stepping up a program designed to remove earlier objections to the Transputer, by supporting industry-standard software, including mainstream programming languages such as C, C++ and Fortran/77, as well as Inmos's own Occam and operating systems for both distributed computing and real-time embedded processing. Inmos is cooperating with Chorus Systems, on its microkernel-based version of Unix for distributed systems and Ready Systems, on its VRTX real-time kernel for embedded applications. The software vendors will be bringing their wares to a unique architecture. Like its predecessors, the T9000 family is based on Inmos's concept of cellular computing. Each powerful processor has its own small on-chip memory and is connected to the outside world by four fast serial links. Such processing elements are commonly used in massively parallel research computers, but Inmos claims the architecture's main strength will be in conventional embedded computing. "With its combination of powerful computing and communications, the T9000 is opening up new fields for the microprocessor," said Philippe Geyres, general manager of SGS-Thomson's Programmable Products Group. "It is going to be valuable for such applications as real-time processing and switching of high-definition video and of broadband ISDN. Other new applications will be in such areas as multimedia workstations, where there is a need for both a high degree of communication capability and huge computing power." It is the Transputer's combination of computing and communications that Inmos believes gives it an edge over competitors in these applications. The company argues that traditional CISC and RISC microprocessors, by emphasizing increased computing performance, do not offer the balanced solution needed for many embedded applications. Inmos believes that by placing equal emphasis on computing and system communications, Transputer-based designs can span the whole range from single-processor applications through various kinds of multiprocessing, to massively parallel architectures. Three main elements To achieve this range, the T9000 family combines three main elements: the T9000 Transputer itself; a line of communications chips that allows T9OOOs to link with each other, with other types of Transputers, and with conventional bus systems; and software support for development, systems and applications, available both from Inmos and from third parties throughout the world. The T9000 Transputer chip integrates on a single chip the CPU; a 16-kbyte local memory- the communications system, which includes its own communications controller (called the virtual channel processor) and four serial links; and several other support functions. The most aggressive element of the T9000 design is the execution unit, which uses a pipelined superscalar-like architecture to process as many as 40 instructions simultaneously, reaching a peak execution rate of 200 Mips at 50 MHz. At the head of the pipeline is a 32-instruction fetch-ahead buffer. From this buffer, a hardware scheduler, called an instruction grouper, picks instructions and groups them together for dispatch. The pipeline can accept one group-up to eight instructions--on each cycle. The actual contents of an instruction group are constrained by the pipeline design. The pipeline can execute two local cache references, two address computations, two main cache references, an ALU operation, and a write or jump operation on each cycle. So a group can contain, for example, no more than two local cache references. In real code, instructions wiU seldom come in just the right order to provide for full groups. What happens in practice, according to Inmos, is that the fetch logic picks up four instructions on each cycle. The grouper will combine these as best it can, actually creating groups of from one to three instructions most of the time. When a multiple-cycle instruction -- say, a multiply-comes along, that stalls the pipeline and gives the grouper a chance to work ahead a little bit. So while the theoretical upper limit for pipeline activity is 40 instructions - one eight-instruction group in each of the pipeline stages at once a more typical scenario would have about eight instructions in the pipeline at any one time, according to Inmos. That would give sustained performance somewhere between 50 and 80 Mips, assuming not too many multiple-cycle instructions come by. The processor contains both an integer unit, which does prirnarily single-cycle operations, and a multicycle FPU, which operates on 32-bit and 64-bit floating-point numbers as specified by the IEEE 754 standard. While the T9000 is upward binary compatible with previous Transputers, it in particular implements the same instruction set as the T805, but with many additions. Inmos estimates that a T9000 running at 50 MHz can typically execute code compiled for the -T805 10 times faster than a 20MHz T805. But this higher performance can be achieved without having to replace existing development tools and software. Inmos claims that only a modest amount of work is needed to modify compilers to produce code optimized for the T9000. Inmos also said that it has overcome the disadvantages of some recent implementations of pipelined and superscalar microprocessors, which require careful programming to obtain the target performance. With the T9000, details of the pipeline are transparent to the programmer, and the processor appears to be the normal Transputer architecture. In order to support the processor's high execution rate, Inmos has devised an extraordinary memory architecture for the T9000. The part has two separate caches--a local one attached to the pipeline, which functions almost as an extended register file, and a main cache. The latter is actually the Transputer's main memory, which is organized as 16 kbytes of fully associative cache. The full associativity, aside from being unique in the industry, permits the small memory to achieve upward of 98 percent hit rates, according to the company. There are two buses between the pipeline and local cache, and four between the pipeline and main memory, providing enormous bandwidth to the processor. In line with the original Transputer concept, the T9000 also has a complete communications subsystem on chip to support interprocessor communications. This subsystem includes four 100-Mbit/s, full-duplex, serial communication links, each with its own pair of DMA channels. The links can be directly connected between Transputers, with no external buffering or other glue logic. Each serial link has a packet-based link protocol, supporting a data rate of 10 Mbytes/s. This gives the T9000 a total bidirectional communications bandwidth of 80 Mbytes/s. Though the T9000 is primarily dependent on its on-chip memory for performance, the part also has provision for substantial off-chip storage. It includes an on-chip programmable-memory interface, which can be linked to memory without glue logic. Providing an external memory bandwidth of 200 Mbytes/s, the memory interface can address up to 4 Gbytes of memory in four independent banks. This allows EPROM and dynamic, static, and video RAM to be addressed directly via 64-, 32-,16- or 8-bit data buses. The T9000 communicates to external components under the control of a 5-MHz clock, simplifying system design and retaming compatibility with previous Transputers. Supporting the CPU The T9000 processor by itself is a formidable device, requiring 2 miDion transistors. But to apply the part to the most demanding communications and computing applications, Inmos realized the need for supporting chips. The transputer alone, with its four links would quickly starve for ta m many multiprocessing topologies. And the T9000 by it self provides no friendly way to connect to any of the buses or devices used by the rest of the world. So Inmos created a line of communications peripherals called the C1XX family. These parts provide off-chip support for multiprocessing, and allow any size of T9000 system to be built. They also permit connections between first-generation ad second-generation Transputers, and provide an interface standard bus systems. Prelaunch details of the ClXX family were reported last year (see Aug 13, 1990, page 1). The C104 is a complete packet routing switch that acts like a single-chip PBX. This connects 32 links to each other through a nonblocking crossbar switch with submicrosecond latency. In this way, it emulates a direct connection between each of the devices in a T9000 network, perrnitting any multiprocessing topology the application wants. If required, multiple C104s can be connected together to make larger networks. Any number of T9000 Transputers can be linked in this manner. In devising its technique for routing messages across a network, Imnos has borrowed from packet-switched data communications, exploiting such concepts as virtual channels, header addressing and wormhole routing. The C104 chip was developed as part of a project called Puma, within the Common Market's Esprit research program, and it is expected to find applications in non-Transputer system communications as well. The second C100-series chip addresses the fact that it would be handy to use older T800 Transputer parts--many of which were specialized for particular I/O tasks and all of which were relatively economical with the new T9000. The T9000 has a new communications protocol, but mixed Transputer networks can be built using the C100 system protocol converter. This allows the optimum combination of Transputers to be used, thereby meeting the needs of processing power, communication bandwidth and system cost. The third communications chip is the C101 link adaptor, which provides a parallel interface between a T9000 link and external systems, such as buses, peripheral devices and other microprocessors. The Software side In the software support area, Inmos made it clear that it does not intend to repeat the errors of the past. In the original Transputer introduction, the only programming language supported was Occam, a parallel programming language unique to the Transputer. But in recent years, Inmos has gathered a large suite of more conventional tools around its architecture, and this effort will directly benefit the T9000. Since the T9000 has instruction set compatibility with the firstgeneration Transputer family, use can be made of existing development and application software. In addition, Inmos is supporting the T9000 with a selection of specially developed compilers for industry-standard languages and through several tool-set enhancements. Third-party software includes compilers for C++ (Glockenspiel) and Ada (Alsys), and realtime kemels and operating systems for distributed Unix (Chorus), VRTX32/T (Ready Systems) and C-Executive (JMI Software Consultants). This week, JMI (Springhouse, Pa.) will announce its port of the C-Executive real-time kernel to the T9000. C-Executive is a preemptive, real-time kernel that is modeled after Unix and is specifically designed for C programming. The kernel is coded primarily in C, making it portable. It currently runs on 20 microprocessors, including most of the major RISC chips, as well as the PC. Many Unix programmers like it because it provides an easy transition from Unix to embedded systems. The Transputer version of C-Executive comes with an optional file system and a system debugger. Altogether, the combination of enormous processing power and familiar tools could be sufficiently attractive to overcome old prejudices about the Transputer. SGS-Thomson and Inmos certainly hope so. The companies see major opportunities for the T9000 in many areas of the 32-bit microprocessor market, where worldwide deliveries are expected to grow from 20 million units this year to 67 million units by 1995. Estimates suggest that, by then, embedded applications will account for 61 percent of the market, compared with 44 percent today. Office automation applications will represent 47 percent of the total, with communications, mu1timedia and military making up most of the balance in embedded uses. The T9000 could also give a new dimension to application accelerators for PCs and workstations. As an example of the possibilities, Inmos marketing director Paul Strzelecki described an accelerator board with 16 interconnected T9000s, which would deliver 400 Mflops and 3,200 Mips. Simon Loe, U.K -correspondent for Electronic World News, Ron Wilson, Nicolas Mokhoff and Ray Weiss contributed to this story.