rick@cs.arizona.edu (Rick Schlichting) (11/06/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Far East (ONRFE). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] H.T. Kung CMU [ht.kung@cs.cmu.edu] Re: Aspects of Parallel Computing Research in Japan---NEC & Fujitsu. Date: 6 Nov 1990 ABSTRACT. Some aspects of parallel computing research in Japan are analyzed, based on authors' visits to a number of Japanese universities and industrial laboratories in October 1990. This portion of the report deals with supercomputing and parallel computing at NEC and Fujitsu. PART 2. The following outline describes the topics that are discussed in the various parts of this report. PART 1 OUTLINE------------------------------------------------------------ INTRODUCTION SUMMARY RECOMMENDATIONS PART 2 (this part) OUTLINE------------------------------------------------ FUJITSU OVERVIEW Company profile and computer R&D activities VP2000 series supercomputer organization and performance PARALLEL PROCESSING ACTIVITIES SP (Logic Simulation Engine) AP1000 (Cellular Array Processor) RP (Routing Processor) ATM (Asynchronous Transfer Mode) Switch MISCELLANEOUS FUJITSU ACTIVITIES Neurocomputing HMET NEC SX-3 series supercomputer organization and performance Benchmark data for SX-3, VP2000, and Cray. Comments MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES PART 3 OUTLINE------------------------------------------------------------ HITACHI CENTRAL RESEARCH LABORATORY HDTV PARALLEL AND VECTOR PROCESSING Hyper crossbar parallel processor, H2P Parallel Inference Machine, PIM/C Josephson-Junctions Molecular Dynamics JAPAN ELECTRONICS SHOW, 1990 HDTV Flat Panel Displays MATSUSHITA ELECTRIC Company profile and computer R&D activities ADENA Parallel Processor MISCELLANEOUS ACTIVITIES HDTV Comments about Japanese industry PART 4 OUTLINE----------------------------------------------------------- KYUSHU UNIVERSITY Profile of Information Science Department Reconfigurable Parallel Processor Superscalar Processor FIFO Vector Processor Comments ELECTROTECHNICAL LABORATORY Sigma-1 Dataflow Computer and EM-4 Dataflow Comments CODA Multiprocessor NEW INFORMATION PROCESSING TECHNOLOGY Summary Comments UNIVERSITY OF TSUKUBA PAX SANYO ELECTRIC Company profile and computer R&D activities HDTV END OF OUTLINE---------------------------------------------------------- FUJITSU OVERVIEW. Currently about a $16Billion US corporation (based on 158Yen/$), with sales and income growing about 10%/year. As with most Japanese companies, Fujitsu includes many subsidiaries (Fujitsu Laboratories, Fujitsu Business Systems, Fujitsu America, etc.), and affiliates, and has about 115,000 employees, about 50,000 in Fujitsu proper, the remainder in associated companies. R&D expenses are about 12% of sales and have been increasing more rapidly than sales growth. Corporate sales are divided as follows. Computers 66% Communications 16 Electronic devices 14 Other 4 The most important factor in sales growth was the rapid growth in overseas (outside Japan) sales, now accounting for about one fourth of the total. The company states that major strategic objectives are to strengthen activities in information management, and further globalize the company. Recently they purchased 80% of British based ICL (International Computers Ltd). Global research and development, including software development is mentioned as a specific goal. The company develops and markets a wide range of computers and related peripherals such as disk subsystems, including a 32 workstation with built in CD-ROM with secretary-friendly video and sound, FM-Towns, (apparently available only in Japan) to a large scale supercomputer, VP2000 series, whose deliveries began spring 1990. A vast range of semiconductor devices, memories, etc. and other new technologies, are sold outside the company and also used in Fujitsu specific products. For example, Sun SPARK chips were originally purchased directly from Fujitsu. The company is also very active in important areas of switching and telecommunication technologies related to HDTV, digital switching systems, etc. Fujitsu is also researching high compression rate encoding for visual telephones and TV conferencing, as well as encoding methods for HDTV and variable rate encoding methods for future packet communications. The main research arm of Fujitsu is the Fujitsu Laboratories, a subsidiary corporation that operates two laboratories, one in Kawasaki and the other in Atsugi, both in suburban Tokyo. Total employment is about 1500. The Atsugi lab, established in 1983 is responsible for research in areas of electron devices, electronic systems, and advanced materials. The Kawasaki lab, established in the mid 1960s is on the grounds of some other Fujitsu facilities, so that the total working population there is over 12,000. The Kawasaki lab concentrates on information processing, communication, space, and personal systems. The overall educational background of the laboratories is interesting. Electronics 48% Physics 19 Computer Science 10 Chemistry 10 Mechanical Engineering 5 All others 8 This is certainly one reason for the wealth of activities in hardware relative to software. Half of the staff have Masters degrees; only 10% hold doctorates. As mentioned above, Fujitsu is working hard to be a global corporation. That means both R&D and manufacturing outside of Japan. For example, Fujitsu signed a five year joint research agreement in October 1989 with the Australian National University in Canberra. Subjects include advanced computers, both large scale supercomputers and more exotic parallel computers, and computer vision using the visual mechanism of insects. Another global research project is with the German software company Aris, to develop software for automatic translation of Japanese technical materials and documents into German. When complete, the system will contain a dictionary, syntax for generating German, and appropriate development tools for both the dictionary and the syntax. Various natural language processing and voice recognition systems are also under study, as is a real-time fingerprint sensor system using holography, and an on-line handwritten input system claimed to be able to correctly recognize Kanji, Katakana and Hiragana Japanese characters. Unfortunately we had no opportunity to see any of these last projects. Fujitsu computers are heavily used in the mainframe world. The company's efforts in large scale supercomputers are interesting. More than 100 orders have been received for computers in the VP2000 series. The most powerful model, the VP2600 has a maximum performance of about 5 gigaflops. According to Fujitsu at least one VP2000 has been installed in Kodak headquarters in Rochester NY. What follows is a brief summary Fujitsu VP2000 series supercomputers. Fujitsu offers four models in this series, as follows. VP2100 /10, /20 (peak performance 0.5 GFLOPS) VP2200 /10, /20 (peak performance 1.0 GFLOPS), /40 (peak 2.0 GFLOPS) VP2400 /10, /20 (peak performance 2.0 GFLOPS), /40 (peak 5.0 GFLOPS) VP2600 /10, /20 (peak performance 5.0 GFLOPS) Models designated as /10 have one scalar and one vector arithmetic unit. Models designated as /20 have two scalar and one vector arithmetic units. Models designated as /40 have four scalar and two vector arithmetic units. The /10 and /20 systems are uniprocessor, the /40 is multiprocessor. Their nomenclature is mildly confusing, as the designation /x0 corresponds to the number of scalar rather than vector units, even though the latter determine peak performance. Fujitsu is deeply interested in multiprocessing; one indication has been their MITI-sponsored research jointly with NEC and Hitachi, called informally the HPP project, involving four VP2600s each operating as a uniprocessor attached to a very large shared buffer memory. Fujitsu claims that such a large multiprocessor was developed mainly to demonstrate their success with room temperature HMET devices (see below) as the communications drivers between the computers and memory. Nevertheless, using this, a NEC researcher was able to solve a very large system of 32K linear equations in less than 11 hours. For more details see Kahaner's report 21 June 1990, "japgovt". Fujitsu is probably experimenting on a /40 multiprocessor for the VP2600, but has not released any public information about this. Without a /40 for the VP2600, Fujitsu's VP2000 series peak performance (however unrelated to actual performance) will fall short of current competition from NEC as well as new machines from Cray, and perhaps others. In the meantime though, the VP2000 series come in a variety of colors, including Elegance Red, Future White, and Florence Green. Peak performance of the /10 and /20 models in any line are the same, as this is determined entirely by vector processing. Peak performance can easily be computed once the machine cycle time and the maximum possible number of simultaneous floating point operations are known. For example, the VP2400/40 and VP2600 each have cycle times of 3.2 nanoseconds. To achieve the advertised 5.0 GFLOPS peak implies 16 simultaneous floating point operations. For the VP2400/40 this requires eight per vector unit, while for the VP2600/20 sixteen simultaneous operations are required. Each of Fujitsu's vector units is described as having two arithmetic pipes, but in reality they are more complicated. Each pipe is capable of simultaneously performing both an addition and a multiplication. In addition the pipes effectively deliver twice (VP2400/40) or four times (VP2600/20) as much data. Thus each pipe on the VP2600/20 can produce the result four floating point additions and four floating point multiplications per cycle. This is similar to the "superword" concept on the ill fated Cyber 205. Of course, if a calculation is dyadic, that is does not involve both a multiplication and addition, then the peak performance will be reduced by 50%. By studying the performance of VP2000 machines on typical job streams it has been observed that when the scalar unit is 100% in use, the vector unit is about 50% to 75% busy. Thus the addition of a second scalar unit can significantly increase throughput, and was presumably Fujitsu's reason for adding it. However, for any single user problem it might not be possible to keep the vector unit constantly busy. Thus the most practical environment for such a setup would be a computing center or other multi user job shop, where several user jobs can be run simultaneously. Kyoto University, a typical busy university computing center, will be getting a VP2600/10 soon. We asked about why only one scalar processor. Although the university made a very strong case for two scalar processors, the Ministry of Education decided (based on budgetary, or other, grounds) to only support the one scalar processor system. However it is an easy field upgrade to add the second scalar unit. The choice of a VP2600/10 rather than a VP2400/40 was a matter of policy; Kyoto has always tried to purchase the fastest machine available. It is also possible that they would like to upgrade eventually to a multiprocessor 2600 when this is available. As is the case with most of today's vector supercomputers, data to and from the vector arithmetic units need to pass through vector registers. In the VP2600 these registers have a capacity of 128KB (64 elements times 256 registers times eight byte data) but can be concatenated in various ways, for example as 2048 times 8 times eight byte instead. Thus the organization of the registers is very flexible. To get data between memory and the vector registers Fujitsu only provides two load/store pipelines. This could be a bottleneck, although the register flexibility may allieviate it to a certain extent. Memory to register bandwidth has been criticised in the VP2000 series, but at least one new benchmark, given below, suggests that Fujitsu has been making efforts to deal with this. The computation of interest is that of multiplying large matrices A=B*C, each of which is 4096 by 4096, with real 64 bit floating point components. The source program is written in 100% standard Fortran but is organized to take advantage of the two pipe structure of the VP2000 architecture in a very clear way. The essential segment of the source program consists of first zeroing the target array. DO 4000 J=1,4096 DO 4000 I=1,2048 A(I,J)=0.0 A(I+2048,J)=0.0 4000 CONTINUE Then the actual multiplication is as follows. DO 5000 L=0,1 DO 5000 J=1,4096 DO 5000 K=1,4096,4 DO 5000 II=1,2048 I=II+(2048+L) A(I,J)=A(I,J)+B(I,K)*C(K,J)+B(I,K+1)*C(K+1,J) * +B(I,K+2)*C(K+2,J)+B(I,K+3)*C(K+3,J) 5000 CONTINUE In this case the matrices are large enough that there is significant memory to register to memory traffic. Nevertheless, Fujitsu's FORT77/VP compiler is able to vectorize this effectively and generate 4.8 GFLOPS, 96% of peak performance. One comment is worth making here. At the InfoJapan 90 meeting a lecture was presented by Nobuo Uchida, from the Mainframe division of Fujitsu, on the architecture of the VP2000 series computers. We found it particularly interesting that his paper made no mention of the /40 series in the VP2000 lineup. The English product announcement about the /40 had been distributed shortly before the meeting, and the Japanese announcement was available weeks before that. Because the /40 is a multiprocessor, it represents a most important addition to their product line. The characteristics and properties of new advanced computers are of real interest to the research community, especially those who travel long distances to hear about them. Perhaps there was a manuscript revision that we did not notice. Nevertheless, it was disappointing that this new system was not included in his discussion. Perhaps it is related to Fujitsu's silence about a VP2600 multiprocessor. FUJITSU'S ACTIVITIES IN PARALLEL PROCESSING. In our recent visit to Fujitsu Laboratories, we visited the following three parallel processing projects. (1) SP (Logic Simulation Engine). This is a special purpose 64 processor event driven parallel computer designed to test the logic design of VLSI chips before they are built. It is claimed that it has larger capacity than any other simulator and that simulation times are about 30 times faster than using Fujitsu's 780 mainframe. Testing a 1MB gate chip takes about 4 hours on the SP, and this is 1000 times faster than the 780. The SP is implemented in TTL, with gate arrays for the ECC implementation. (Fujitsu can build 200K gate, 331-pin arrays currently.) Ten SP machines have been built, and 2 are in use by Amdahl in the U.S. The others are for internal use. Fujitsu claims that partly due to its use of event driven simulation, SP is 100 times faster than the IBM Yorktown Simulation Engine and feels that the SP is a successful effort. (NEC Corp also has a logic simulator, Hal II and TDHal.) It seems that most computer companies in Japan have developed their own special purpose parallel engines for logic simulation for their internal use. (2) AP1000, renamed from older CAP (Cellular Array Processor). This is composed of up to 1024 cells or processors. Each cell is composed of a SPARC chip (for ease of software development), Weitek floating point unit and gate array router running at 25MHz, and 16MB of memory. Cells can communicate using wormhole routing in a two dimensional mesh using 25MB/sec channel. The standard structured buffer pool is used to avoid deadlocks. The network also supports row and column broadcasting. The router and SPARC connection is 40 MBytes/sec. Since the connection is also shared by the CPU cache, the actual available bandwidth is still under evaluation. In addition, a special frame buffer can read out from each cell so that image data can be partitioned up among cells efficiently. Maximum performance is 12.5MFLOPS/cell, and 12.8GFLOPS for a fully configured 1024 cell system. AP1000 has good (but not spectacular) communication and good numerical performance potential. Fujitsu expects that it will typically be connected to a Sun-4 as a host via a VME bus. This project has been going on for a number of years under the old name CAP. (CAP is also the name of the Cellular Array Processor developed by Mitsubishi Electric for satellite image processing. As far as we know there is no relation between these projects.) A team of about 10 people have been at work on the AP1000 for two years. The new AP1000 system is much more powerful, primarily because of the use of SPARC chips and Weitek floating-point chips. In contrast, the old system used Intel 80186 chips. Present plans are to begin production this fall with installation of 7 or 8 machines in spring of 1991. Of these, most are to be 64-cell systems and one is to be a 512-cell or 1024-cell system. (A 1,024-cell system is scheduled to be built in April 1991.) Currently a 16-cell system is running. The 64-cell system, with about 800 MFLOPS peak performance should cost the company about $300K U.S. We were shown a straightforward ray-tracing example which is a perfect candidate for data parallelism. The system currently has a home-made run-time system, and no parallelizing compiler for either C or Fortran. We were told that in addition to scientific computing, visualization, and CAD, one potential application was for design rule checking, but in that case it isn't clear why floating point is necessary. The Australian National University will get a 128-node AP1000 system and will help with software development and evaluation. (Contact: Prof. M. McRobbie [mam@arp.anu.edu.au]). As with the earlier CAP project, Fujitsu has a nice color sales brochure about the AP1000, but this is still considered an experimental machine. Probably its most important uses will be internal to Fujitsu, similar to the SP model. We feel that the project is probably a few years behind similar work at leading research places in the U.S., primarily because of the differences in software and interprocessor communications capabilities. Two contacts for this project are given below. Mitsui Ishii [mishi@flab.fujitsu.co.jp] Hiroyuki Sato [hsat@flab.fujitsu.co.jp] Fujitsu Laboratories 1015 Kamikodanaka Nakahara-ku Kawasaki 211, Japan Tel: (044) 777-1111, -2327 (3) RP (Routing Processor). This is a special-purpose SIMD machine to implement the maze routing. A performance goal is to route large (e.g., 100K-gate) gate arrays in approximately one hour. To implement the machine, bit-serial PEs (Processing Elements) are used. A 4K-PE system is operational. We saw a successful demonstration of the system in doing a difficult switch box routing. Since in maze routing only PEs on the wave front are active at a given time, the system will typically multiplex four "logical" PEs onto each "physical" PE to ensure efficient utilization of physical PEs. Approximately 5 people have been working on the RP project for two years. They are currently building a 16K-PE RP. A challenge of using special purpose CAD engines such as the RP is its graceful integration with the rest of the CAD system. Also, it is not clear about how the RP can take advantage of hierarchical information available in a design. Fujitsu researchers are looking at these issues. ATM Switch. In addition to the three parallel processing projects described above, we also visited a major project on the development of an ATM (Asynchronous Transfer Mode) switch. The basic idea is that data is divided into cells which are 53 byte packets and then transmitted along the transmission path without synchronization. The application area here is ISDN and HDTV. Such a switching system will be able to handle multi-media communication of voice, data, video, etc. Fujitsu has been working on this project for several years, and CUT? -> NEW--I didn't like your English & changed it below a bit. claim that they have prototyped the world's first ATM switch. Built out of a special IC using a BI-CMOS RAM and logic gate array, the current system is a 16 by 16 switch, of three stages with two 8 by 8 crossbar switches per stage. Each port is 78 MHz and 16-bit wide, allowing for 1.2 Gbits/second per port. The 16 by 16 switch, housed in one cabinet, therefore can handle 128 150 Mbits/second channels. There is a 128-cell buffer at each output port of every crossbar. Switch routing is based on the destination tag, corresponding to the virtual circuit identifier (VCI) number. Cell sequencing is maintained, but cells may lose data if there is congestion. Presently, two 16 by 16 prototypes have been built and are being used to evaluate cell lossage characteristics. Eventually a SONET interface will be installed, but this is not supported yet. Instead a proprietary interface is being used during the testing phase of the project. In parallel processing the company's research effort emphasizes more special-purpose machines such as SP and RP than we would expect from a U.S. company. The best research projects such as ATM switch, SP, and RP, are completely driven by development needs. The strongest efforts seem to be related to switching and the CAD related issues. Projects more to do with basic research such as AP1000 do not seem to be as advanced compared to work in the U.S. MISCELLANEOUS FUJITSU COMPUTING ACTIVITIES. Neurocomputing. The usual metric here is the number of changes to the weight matrix that are possible each second. Earliest research in neurocomputing used traditional computers to simulate the architecture of a neural network. The next step is to implement some aspects of the network in hardware. By using special purpose digital signal processor chips Fujitsu has demonstrated more than 500 million connection changes per second. A longer range goal is to use biological elements as part of the architecture, but we have seen no substantial results yet. Associated with neuro computers are various forms of inference engines that are often implemented with robot applications in mind. Fujitsu has also been working in these areas with particular emphasis on robot vision. This again relies of special purpose hardware. They have also used fuzzy logic to study driverless vehicles and obstacle avoidance. They have developed the Idaten color image processing system which can be used to distinguish objects moving at different speeds, and so, for example, to do real time scanning of a runner, determine speed and stride and then estimate the time to finish line. This particular research has applications in many other areas and should be followed. Another neural net research project has been joint with Nikko Securities to investigate how well neural nets can predict the buy/sell times for stock transactions and to rate convertible bonds by looking at various financial indices. Takashi Kimoto Computer Based Systems Lab Fujitsu Laboratories, Kawasaki 1015 Kamikodanaka, Nakanara-Ku, Kawasaki 211, Japan Electrical devices, including an 8 bit Josephson digital signal processor, and room temperature HMETs (High Electron Mobility Transistor). In 1980 Fujitsu developed HMET. At liquid nitrogen temperatures, -196C, electrons move about 200 times as fast as they do in silicon. As part of the government sponsored "high speed computing" project Fujitsu has now developed a 4K-bit static RAM that operates at room temperature with 500 pico second clock (fastest memory operations yet reported), and a 4.1K-gate gate array. Further developments have resulted in a chip with 3335 HEMTs with 490ps data propagation time. Fujitsu claims that they will use this in a new version of a supercomputer they will soon build. Presently, several prototype system components at the LSI level have been built. These are a 1.1K-gate bus driver, a 3.3K-gate random number generator (1.6GHz), and an 8-bit digital-to-analog converter (1.2GHz). This technology, which is almost completely proprietary to Fujitsu, may be significantly useful in future computing systems. However, since the HPP project is over, it will not be easy for Fujitsu to build these kind of experimental supercomputers unless they can be supported by some new government programs. Our overall host for this visit was Mr. Shigeru Sato Board Director Fujitsu Laboratories 1015 Kamikodanaka Nakahara-ku Kawasaki 211, Japan Tel: (044) 777-1111 Mr. Sato spent many years in one of Fujitsu's development "works" before moving to the laboratory. We were impressed with his basic grasp of technical issues and understanding of the role that research plays in the development cycle. We asked him if the efforts of other Japanese companies (such as NEC) to establish research laboratories outside of Japan had any parallel at Fujitsu. He explained that Fujitsu had several active research collaborations including at the Australian National University, mentioned above, and it was also looking into the possibility of having closer contacts with some U.S. universities such as Carnegie Mellon, in Pittsburgh. Although he was remarkably frank with us, we didn't have time to discuss strategic issues with Sato. We did ask about the success of technology transfer, and he suggested that one reason for its success is that researchers define the research project with development groups before the project actually begins. Two days after this initial visit, Kung with T. W. Kang (General manager, Systems Group of Intel Japan) went back to visit the Fujitsu Laboratories again for a meeting with their researchers. The purpose of the meeting was to discuss applications areas for iWarp-like distributed memory parallel machines. We identified several potential areas and had some lively discussions. It was generally felt that some CAD areas and the neural net learning can make the best use of parallel machines. In the CAD area, we predicted that the expected speed up ratio due to parallel processing will be 100,000 for logic simulation, 1,000 for test-pattern generation and for placement and routing, and 100 for design rule check and circuit simulation. The fruitful discussion meeting was organized by: Fumiyasu Hirose, Senior Researcher Artificial Intelligence Laboratory Fujitsu Laboratories LTD. 1015, Kamikodanaka Nakahara-Ku Kawasaki 211 Tel: (044) 754-2663 FAX: (044) 754-2580 Email: hirose@yugao.stars.flab.fujitsu.co.jp NEC. Kahaner visited this factory in March 90 and reported on the SX-3 at that time. Then the only running system has one processor. Now, several one processor machines are being tested prior to shipment and a two processor system has been setup and is being debugged. Chief designer Watanabe stated that a one processor system depending upon peripheral options would cost in the neighborhood of $10 million U.S. He claimed that the 4 processor system will be up in a few months, and we have heard estimates that it will cost roughly $25 million. Peak performance of a uniprocessor system is 5.5 GFLOPS, based on a cycle time of 2.9 nanoseconds and 16 simultaneous operations (16/2.5=5.5). The vector unit in such a system consists of one, two, or four sets of vector pipelines. Each vector pipeline set consists of two add/shift and two multiply/logical functional pipelines. Each of the functional pipelines can be operated simultaneously; thus the arithmetic processor in a uniprocessor system with four vector pipeline sets can execute up to 16 floating point operations per machine cycle. To get near peak performance all 16 pipes must be kept busy. Data are fed to and exit from the arithmetic pipes to vector registers, with a maximum capacity of 144KB. It is unlikely that an SX-3 system would be purchased without all four pipes in each processor. The four processor system is thus capable of 22 GFLOPS peak, although this assumes that all the data can be kept in the vector registers. To the extent that data must be brought from main memory to the registers performance may degrade. The bandwidth between memory and the registers depends on the memory hardware technology, and on how the data is arranged in the memory banks, but serious applications must keep data in registers to get good performance. Further, 22 GFLOPS requires 64 simultaneous operations, and this will mean that different operations have to occur simultaneously. Also, unless the user program can be divided up into simultaneous, independent tasks that use the same data in the vector registers, arrays will have to be quite long to absorb the startup penalty of being parcelled out to several processors. The most effective environment for such multiprocessors is a busy multiuser computer center, similar to that for other large multiprocessors. Most computer centers will charge a penalty for single users who want to grab all four processors. Yoshihara also discussed some aspects of this in benchmark calculations earlier this year, see Kahaner's distribution 1 May 1990 "yosh". At least three or four uniprocessor systems have been sold, in Europe. We were not told about sales of two or four processor systems. Users can write Fortran without any special directives. NEC provides an automatic parallelizing and vectorizing compiler option. We had no opportunity to test this. Watanabe showed us results of running 100 by 100 LINPACK (all Fortran) giving performance on the SX-3 Model 13 (uniprocessor) and several other supercomputers as follows. He also showed some corresponding figures for 1000 by 1000 linear system and for 1024 by 1024 matrix multiplication given below. The last two columns correspond to what Dongarra calls "best effort". There are no restrictions on the method used or its implementation. Matrix multiplication runs almost at theoretical peak speed. The large linear system runs at slightly less than 70% of peak, while on the Cray the same calculation runs at just above 80%. The differences are probably associated with bandwidth from memory to the vector registers. Nevertheless, at 3.8 GFLOPS the SX-3 is 80% faster than the Cray. Ax=b Ax=b A=B*C LINPACK Best Effort Matrix Mult 100 x 100 Fortran 1000 x 1000 1024 by 1024 SX-3/14 216 MFLOPS 3.8 GFLOPS 5.1 GFLOPS Fujitsu VP2600 147 2.9 4.8 (4096 by 4096) Hitachi S-820/80 107 Cray Y-MP8 (8 processors) 275 2.1 Cray Y-MP1 (1 processor) 90 Cray X-MP4 0.8 (Note: VP2600 model was not specified for the Ax=b figures, and was /10 for A=B*C, but both 2600/10 and /20 have the same peak performance, 5 GFLOPS.) To the best of our knowledge, figures for the NEC and Fujitsu machines are new. We asked Watanabe if the SX-3 four processor performance would scale up, and he only exclaimed "God knows". NEC's chip technology is very good. Using ECL, they have crammed 20,000 gates with 70 pico second switching time onto one chip. We think that this is better than in the U.S. A 1,200-pin multi-chip package can hold 100 such chips and dissipate 3K watts. Packaging, carrier, and cooling technology is about as good as in the U.S. NEC claims that they have taken extra care to design in error testing capability and that about 30% of their chip area is associated with diagnostic functions. (This is certainly different from some U.S. manufacturers.) The memory system uses 20ns 256Kbit SRAMs. A memory card can hold 32 MBytes. Thus a memory cabinet with 32 memory cards has 1 Gbytes. Two peripherals are worth noting. NEC makes a cartridge tape unit (IBM compatible tapes), fully automated, with 1.2 terabyte capacity. NEC also makes a disk array made of eight byte-interleaved disks. Used as a single disk drive, the disk array has a 5.5 gigabyte capacity. The burst transfer rate is 19.6 MBytes/sec, whereas the sustained transfer rate is 15.7 MBytes/sec. NEC has begun publication of a newsletter about the SX-3, SX World. Interested readers can obtain a copy by writing NEC, 1st Product Planning Department, EDP Product Planning Division, 7-1 Shiba 5-chome, Minato-ku, Tokyo 108-01, Japan. In this their view of supercomputing is stated explicitly, "the actual performance of a supercomputer is determined by its scalar performance...NEC's approach to supercomputer architecture is clear. Our first priority is to provide high-speed single processor systems which have vector processing functions and are driven by the fastest technologies, while giving due consideration to ease of programming and ease of use; we also seek to provide shared memory multiprocessor systems to further improve performance." The SX-3 looks like an exciting machine that is on a par with the best currently available U.S. products. There is a new U.S. supercomputer from Cray Research nearly ready to be released, as well as perhaps models from Cray Computer Corporation and others, but we have no concrete information about their performance. In its four processor version, the SX-3 might be the fastest large scale supercomputer, but this will be entirely dependent on the application and the skill of the compiler writers. Fujii and Tamura ("Capability of Current Supercomputers for Computational Fluid Dynamics", Inst of Space and Astronautical Sci, Yoshinodai 3-1-3, Sagamihara, Kanagawa, 229 Japan), note that "Basically the speed of the computations simply depend on when the machines were introduced into the market. Newer machines show better performance, and companies selling older machines are going to introduce new machines." NEC develops software expertise by use of in-house training; they have a "college" for their employees. For example, Watanabe is in charge of courses related to machine design. They also have a long history of vector computing experience, as NEC mainframes have had vector pipes for many years. They do not have experience in large scale multiprocessors as far as we know, except through the HPP project, which was never commercialized. To developing software, NEC relies on 30 or so of its subsidiaries in various places of Japan. So software is often developed in a distributed manner. Watanabe told us that NEC did not have any plans to develop a smaller general purpose multiprocessor, as they felt that the market would not support the volume that would be required for profitability. Watanabe has moved from the SX-3 factory to the corporate headquarters as a strategic product planner. The latter is one of the largest buildings in central Tokyo, is shaped exactly like the U.S. space shuttle except for a huge gaping hole in its center to reduce wind loading. It is said to be the "world's smartest building." Watanabe represents an illustration of the remark made earlier about senior research people moving into other corporate functions. Dr. Tadashi Watanabe Assistant General Manager EDP Product Planning Division NEC Corporation 7-1, Shiba 5-chome Minato-ku, Tokyo 108-01 Tel: (03) 798-6830 (Direct), (03) 454-1111 Fax: (03) 798-6838 As far as innovative architectures are concerned, the SX-3 does not seem to represent a substantial leap from state-of-the-art supercomputers. Researchers in parallel computing are not excited by shared memory machines, which they feel cannot scale up to make the kind of quantum increases in computing speed that they are seeking. But as an engine for solving complicated scientific and engineering problems, a factor of two, or even a percentage improvement translates into real money and new science. What is significant is how far NEC has come in a relatively few years. Now NEC has state-of-the-art capabilities in all aspects of supercomputers, except perhaps in some software applications. They do not give away any area and make every effort to build everything themselves. Customers seem quite loyal in Japan. Software compatibility with existing systems, and personal relationships between vendor and customer are important here, perhaps taking the edge off price differences or delivery dates. It is difficult to accurately judge how the SX-3 will compare with the new U.S. supercomputers that should be delivered within the next year, but it is clear that it should be at least competitive with them. It would be very useful for western researchers to have an opportunity to study, test, and use this computer. We have not had any chance to run on the SX-3 although potential customers have had a few of their important programs benchmarked. One system, probably a two or four processor version, will be installed in the NEC HNSX facility in Houston. We do not know about access to that, however in the past the most impressive learning has occurred when supercomputers were "on site". Real benchmarking can only occur when a computer is used day in, day out, and all aspects of its capabilities, problems and reliability are uncovered. If it isn't practical to get an SX-3 into a major U.S. laboratory, we should consider the possibility of sending computational scientists to Japan for several months or even a year, in order to thoroughly evaluate the machine. NEC should be interested in these efforts too. MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES. Other than the SX-3 series supercomputer, NEC has been involved in at least four other parallel processing activities. These are: (1) FMP (Fingerprint Machine Computer) with 28 processors, (2) VSP-4 (Video Signal Processor) with 128 processors, (3) HAL (Parallel Processing Logic Simulator) with 64 processors, and (4) CENJU (Parallel Processing Circuit Simulator) with 64 processors. FMP is a commercial product and about 180 sets of FMP systems have been shipped in the world. The VSP effort has influenced the NEC Visualink-1000 which is commercially available. HAL has been used in designing NEC SX-3, SX-3 Series and other general purpose computers since 1985. CENJU is an experimental machine being used for design of DRAM, and Kahaner reported on CENJU in a 2 July 1990 report "spice". -----------END OF PART 2-------------------------------------------------
rick@cs.arizona.edu (Rick Schlichting) (11/06/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years
under the auspices of the Office of Naval Research-Far East (ONRFE).
The following is the professional opinion of David Kahaner and in no
way has the blessing of the US Government or any agency of it. All
information is dated and of limited life time. This disclaimer should
be noted on ANY attribution.]
[Copies of previous reports written by Kahaner can be obtained from
host cs.arizona.edu using anonymous FTP.]
To: Distribution
From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp]
H.T. Kung CMU [ht.kung@cs.cmu.edu]
Re: Aspects of Parallel Computing Research in Japan---Hitachi,
Matsushita, and Japan Electronics Show 1990.
Date: 6 Nov 1990
ABSTRACT. Some aspects of parallel computing research in Japan are
analyzed, based on authors' visits to a number of Japanese universities
and industrial laboratories in October 1990. This portion of the report
deals with parallel computing at Hitachi and Matsushita, and some
observations about the Japan Electronics Show 1990.
PART 3.
The following outline describes the topics that are discussed in the
various parts of this report.
PART 1 OUTLINE-----------------------------------------------------------
INTRODUCTION
SUMMARY
RECOMMENDATIONS
PART 2 OUTLINE-----------------------------------------------------------
FUJITSU OVERVIEW
Company profile and computer R&D activities
VP2000 series supercomputer organization and performance
PARALLEL PROCESSING ACTIVITIES
SP (Logic Simulation Engine)
AP1000 (Cellular Array Processor)
RP (Routing Processor)
ATM (Asynchronous Transfer Mode) Switch
MISCELLANEOUS FUJITSU ACTIVITIES
Neurocomputing
HMET
NEC
SX-3 series supercomputer organization and performance
Benchmark data for SX-3, VP2000, and Cray.
Comments
MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES
PART 3 (this part) OUTLINE-------------------------------------------------
HITACHI CENTRAL RESEARCH LABORATORY
HDTV
PARALLEL AND VECTOR PROCESSING
Hyper crossbar parallel processor, H2P
Parallel Inference Machine, PIM/C
Josephson-Junctions
Molecular Dynamics
JAPAN ELECTRONICS SHOW, 1990
HDTV
Flat Panel Displays
MATSUSHITA ELECTRIC
Company profile and computer R&D activities
ADENA Parallel Processor
MISCELLANEOUS ACTIVITIES
HDTV
Comments about Japanese industry
PART 4 OUTLINE-------------------------------------------------------------
KYUSHU UNIVERSITY
Profile of Information Science Department
Reconfigurable Parallel Processor
Superscalar Processor
FIFO Vector Processor
Comments
ELECTROTECHNICAL LABORATORY
Sigma-1 Dataflow Computer and EM-4
Dataflow Comments
CODA Multiprocessor
NEW INFORMATION PROCESSING TECHNOLOGY
Summary
Comments
UNIVERSITY OF TSUKUBA
PAX
SANYO ELECTRIC
Company profile and computer R&D activities
HDTV
END OF OUTLINE------------------------------------------------------------
HITACHI CENTRAL RESEARCH LABORATORY.
Kahaner has already written about Hitachi generally, and about some
aspects of the CRL activities, see 21 Sept 1990 article "hitachi", so
this report focuses only on those aspects of the visit that provided new
insights.
One important reason for our visit was for Kung to inquire about the
possibility of using the CMU-Intel iWarp parallel processing system in
HDTV applications. We had a meeting with Senior Chief Researcher
Fukinuiki, (Telephone: (423) 23-1111, x 2009), who leads the technical
part of Hitachi's HDTV program. Fukinuki is well known in the West and
he showed us a paper from David Sarnoff Research Center in which his
ideas were made the basis of a main part of their program. From the
perspective of computation the needs are enormous. Requirements are for
10**5 to 10**10 integer operations per signal sample. At 28.6 MHz this
requires at least (28.6*10**6)*(10**5)=2860 GigaOPS, and at 100 MHz at
least 10 TeraOPS. Three dimensional processing will be even more
demanding. Because of this high processing rate ASIC will be needed, in
Fukinuiki's opinions. Programmable processors such as iWarp or DSP
(digital signal processor) will be too slow and too expensive. Also,
division is not needed, and all multiplications are by fixed constants
(associated with filter coefficients) so special ROM for table lookup
can be used. Hitachi is planning to build a special pipeline of ASIC
blocks with 28.6 MBytes/sec or 100 MBtyes/sec bandwidth between
connecting blocks. On the other hand, Fukinuki told us that he felt a
fast general purpose parallel processor such as iWarp might be useful in
those areas where real-time processing was not needed or processing rate
needn't be that high, for example document preparation (e.g., image
processing), robotics, or video phone (only requiring 64Kbit/second).
Overall, this was a sobering meeting with a Japanese scientist who had
clearly mastered his subject.
HITACHI PARALLEL PROCESSING ACTIVITIES.
We had very brief opportunities to visit three parallel processing
projects. Below, we sketch the main ideas we were able to cull from our
short visits.
(1) Hyper crossbar parallel processor, H2P.
This is a MIMD architecture with an unconventional interconnection
network. The processors are first of all thought of as lying on a
hypercube. The new ingredient is that all the processors on any plane
parallel to a coordinate axis, i.e., on an cube face, are connected by a
crossbar network. Thus only 2 "hops" are needed at most to connect any
two processors. Hitachi is clearly trying to exploit their outstanding
hardware capability in the design of the required crossbars and network
routers. At the moment this is pure research, a paper computer. They
have studied hyper crossbar structures that have the minimum number of
switches for given numbers of processors. Hitachi is planning to build
the crossbar and router chips within a year. H2P parallel systems of
more than 1024 processors are envisioned.
(2) Parallel Inference Machine, PIM/C.
The "C" denotes their working language. This is a fairly conventional
architecture, except for hardware support for typing. It is designed in
the form of eight processors in a cluster, with one cluster fitting in a
standard rack. The processors are on a bus using conventional snoopy
caching. Currently there are two clusters built, and plans are that 32
clusters will be complete within a year. An interesting aspect of of
the PIM/C architecture is that it has hardware support for load
balancing. Another interesting point is that it uses a very impressive
Hitachi mainframe pin array package, with 50 mil spacing. The contact is
Dr. Mamoru Sugie
Central Research Lab.
Hitachi, Ltd.
Higashi-Koigakubo, Kokubunji,
Tokyo 185, Japan
Tel: +81-423-23-1111 x3810
Email: sugie%crl.hitachi.co.jp
(3) Josephson-Junctions.
This work has been going on for about ten years, as part of the MITI HPP
project, which ended this year. Activities involved not only Hitachi,
but Fujitsu, and NEC too. Currently about 10 people are working on J-J.
With the MITI project over Hitachi will certainly scale back their
efforts, but will not terminate them. They have prototyped a 1 billion
instruction per second superconducting microprocessor (with about 2K
gates, in a 7mm by 7mm chip), and a 1KB RAM chip (also 7mm by 7mm).
Currently switching time is about 10ps. Also they have discovered a new
transistor design, which they believe that previous efforts, such as
IBM's abandoned J-J effort, did not have. The researchers told us that
practical application of this technology may be 10 to 20 years away.
(4) Molecular Dynamics
This is really an application of high speed computing needs. Dr. Shigeo
Ihara in Hitachi's 7th Dept (ULSI Research Center) showed us his work on
modeling the surface of Si(100). His research is rather different from
conventional molecular dynamics models as it emphasizes the quantum
mechanical model for computing the forces. Thus computing the forces is
very expensive. His integration scheme is conventional, even somewhat
old-fashioned, but the force calculation is the key time sink here. He
claims that his model requires about 100 hours on an Hitachi S-810,
which for this problem is about three times as fast as on a Cray 1. Even
then he is only able to move around about 100 particles. His results
indicate existence of interstitial dimer, not predicted before, recessed
from the surface instead of vacancies as has been traditionally
believed. However, he also acknowledged that the integration step size
may still be too large and the results might be contaminated with
numerical error. When we asked if it was possible to run this on a
faster computer he explained that Hitachi would soon announce a faster
supercomputer.
JAPAN ELECTRONICS SHOW, 1990.
We spent one free afternoon here and so were only able to get a general
impression. This is surely one of the worlds largest such shows, with
tens of thousands of square meters of exhibits in nine large buildings.
Two hangar sized buildings were associated with consumer electronics,
and all the rest were displays of very specialized parts and component
technology. Not surprisingly, the consumer electronics buildings were
mildly disappointing after seeing vast sections of Tokyo loaded with
electronics stores. Also the exhibitors were not interested in
displaying all their wares. This particular show was very clearly
focused on HDTV, High Definition TeleVision, or HVTV (High Vision TV) as
it is known here, with several hundred systems setup for display. So
many different companies were exhibiting that we wondered why there was
so much emphasis when there are almost no commercial systems available,
videotapes, television broadcasts, and no likelihood of any for at least
a few years. Serious broadcasting doesn't seem to be any closer than
1995, and some of the people we spoke to suggested that year 2000 was
more likely for widespread household use. Further the price of current
HDTV systems (to the extent that it can be estimated) is very high, and
unless it can be knocked down by a factor of ten there will be little
consumer interest. But, as all who have seen demonstrations will attest
the systems are visually impressive and might even now be of interest in
some specialized commercial situation where exceptional graphics will be
important. We can also imagine that bars will buy HDTV for sports fans.
MITI created a Hi-Vision Promotion Center (HVC) in 1988. This is a
corporation whose members include all major HDTV manufacturers in Japan.
According to their literature, "The Center promotes wider use of
Hi-Vision technology in industry and by government organizations through
identification, research, and analysis of problems existing in such
public services as museums, medicine and education, and industrial areas
(including theaters and amusements)." Currently, the government
television network NHK is providing some HVTV time each week on an
experimental basis, and some advertisements for commercial systems are
beginning to appear in the papers.
For HDTV to be of practical interest for transmission of live programs,
or to store HDTV pictures on either laser disks or CD ROMs substantial
research needs to be done. The main problem is that there is simply too
much data. A typical HDTV picture contains about 1000 x 600 pixels in
each of three colors, and frames come 30 times each second. Even with
the best data compression algorithms now available a huge amount of data
needs to be processed. This looks like a natural application for
specialized parallel processing hardware and software, but to be
practical the hardware must be inexpensive enough to be placed in every
set. As we learned during our visits to Sanyo, Fujitsu, and Hitachi,
there is active research going on in both real time data compression,
and development of such specialized hardware. There is obviously a
connection between success in this technology and success in other
information processing activities. The Japanese companies are all more
or less at the same point because they have been meeting in committees
to establish standards, and this naturally leads to some sharing of
information. Each company seems to have some very unique
characteristics, although in a global sense they all seem to be pretty
much alike. HDTV is another example of persistence in research; the
U.S. gave up years ago, although there is still active research in
Europe. The Japanese see the underlying technology here as a key one.
As a specific example of the application of these ideas, the importance
of ASIC, and the "last ten percent", we note that NEC has developed a
hardware data compression system for color photographs using cosine
transformations. The original color transmission system was developed
earlier in Israel and U.K. but used software for the compression and
decompression cycle. NEC has now built this using special purpose
hardware. It is still much too slow for HDTV though.
The Electronics show also gave us an opportunity to see not only many
examples of HDTV, but also packaging, keyboards, liquid crystal
projectors, and new flat panel displays. The Japanese have been
performing research in flat panel technology for decades, initially for
TV display, and well before there was even a glimmer about light, laptop
computers. The flat panel screens have been adapted to TV use too, as
can be attested to by Japan Air Lines first class passengers who get a
three inch color set on a stalk attached to their armrest. As another
example, Japan Broadcasting Corp. (NHK) has developed a 33inch plasma
color display panel with a thickness of 6 mm weighing about 6kg. The
flat displays also have obvious applications in vehicles, which merges
very nicely with the growth of automobile navigation systems.
MATSUSHITA ELECTRIC.
This company is not well known in the west, even as their product names
Panasonic, National, JVC, and Technics are. It is best known for its
outstanding manufacturing capabilities (they even do manufacturing for
IBM). Matsushita, founded in 1918, had sales last year of almost
$40Billion U.S., and employs about 200,000. Sales, income, and net
income have been growing at nearly 10 percent annually. The company's
main growth areas are in communication and industrial equipment. Audio
equipment, electronic components, semiconductors, batteries, and kitchen
equipment have also grown but not quite as fast. They have identified
six target areas for the future, information/communication, factory
automation, semiconductors, autovisual, automotive electronics and
housing/building products. This includes, specifically, HDTV, where they
admit a huge investment will be needed to keep pace with the rapidly
changing technology.
As with many other large Japanese companies Matsushita hopes to become
more global, and targets 1994 as the year when the ratio of
internationally produced goods to total overseas business will be 50%.
This year the first American President was appointed at Matshushita
Electric Corp of America. In a similar way they hope to localize their
R&D activities. One example is the Panasonic Advanced TV-Video Labs, in
New Jersey. Also, as with other companies Matshushita really means many
subsidiaries; in this case 117 companies in 38 countries.
Corporate sales breakdown is as follows.
Video equipment 27%
Communication and
industrial equipment 23
Audio equipment 9
Home appliances 13
Electronic components 13
Batteries & kitchen 5
Other 10
Kahaner reported on a visit to a National (Matsushita) factory, see
30 July 1990 file "flexible".
The company was one of the first to incorporate fuzzy logic into their
consumer products. Whatever one may think about the content of this
technology, the public is enthusiastic about buying products described
in this way. In addition to video cameras, Matsushita also markets fuzzy
washing machines, vacuum cleaners, refrigerators, and air conditioners.
The company owns a majority share in the Boulder, Colorado workstation
maker, Solbourne Computer, and has begun to market the workstation. On
our visit to Matsushita we asked why yet another Unix workstation, and
were told that the company feels its performance is better than
comparably priced Suns, and that it can be successful with this product
if it is priced very competitively.
Corporate R&D is divided into seven organizations and their
suborganizations. We have annotated those labs that have major computer
related research activities.
Kansai
Tokyo
Information Equipment Research Laboratory
Computer related activities include computer systems architecture,
operating systems, compilers, natural language processing, machine
translation, multimedia database systems, distributed parallel
processing knowledge based and expert systems, development tools,
image processing, communications systems such as optical, B-ISDN,
satellite, networks, data storage equipment and printing equipment.
Tokyo Information and Communications Development Center Audio Video
Research Center
Image Technology Research Laboratory
Acoustic Research Laboratory
Display Technology Research Laboratory
Magnetic Recording Research Laboratory
Materials and Devices Research Laboratory
Computer related activities include HDTV research, and basic
technology research in areas of video signal generation,
processing, recording, display, transmission, compression, as well
as display devices.
High Definition Television Development Center
Semiconductor Research Center
VLSI Technology Research Laboratory
VLSI Devices Research Laboratory
Opto-Electronics Research Laboratory
Living Systems Research Center
Living Environmental Systems Research Laboratory
Electrochemical Materials Research Laboratory
Lighting Research Research Laboratory
Central Research Laboratories
Computer related activities in the area of intelligent mechanisms,
human brain, natural systems, user friendly interfaces, multistage
reasoning, fuzzy logic, neural networks, multimedia and hypermedia.
Matsushita does not break out the number of employees engaged in
research, but R&D expenditures (currently about $2.5Billion U.S.) are
about 6% of sales and have been increasing at a higher rate.
Confusingly, subsidiary companies have laboratories of their own. For
example, Matsushita Electronics Corp has seven laboratories.
Our visit was to the Central Research Labs in Osaka and focused on
parallel computing and graphics applications. Frankly, we are not sure
how these research projects fit into to list of topics above, as they
seem more naturally associated with some other laboratories.
Unlike many other Japanese companies which have prominent statues of
their founders, Matsushita Central Research Laboratory has statues of
great scientists from Japan and other countries, including Marconi, Ohm,
and Edison in their courtyard. On the other hand, the dress code of
everyone wearing overalls has only recently been removed, and we were
also treated to the company marching song played like Muzak during our
visit. Some of the Central Research Lab buildings (such as the Kadoma
Building) are old and have an informal, cozy feeling, with an atmosphere
like many American labs. This was the site of the oldest company lab and
several of the buildings date back to before WWII.
Our overall host for this visit was:
Mr. Teiji Nishizawa, Manager Computer Architecture
Kansai Information and Communications Research Laboratory
Matsushita Electric Industrial Company Ltd.
1006 Kadoma, Kadoma-shi
Osaka 571 Japan
Tel: (06) 908-1291, Fax: (06) 903-0370
Email: NISHIZ@SY2.ISI.MEI.CO.JP
ADENA (Alternating Direction Editing Nexus Array).
ADENA was developed by
Prof. Tatsuo Nogi
Department of Applied Mathematics and Physics
Kyoto University
Yoshida Honmatchi, Sakyo-ku
Kyoto 606 Japan
Tel: (075) 753-7531 x5871, Fax: (075) 761-2437
Email: NOGI@KUAMP.KYOTO-U.AC.JP
starting with work about ten years ago. The Matsushita group, while
extremely knowledgeable about ADENA's hardware and system software, were
less familiar with how it was to be used, and in fact we did not see
ADENA operating while we were visiting Matsushita.
Our hosts for this part of the Matsushita visit were
Dr. Hirosha Kadota, Senior Staff Researcher
Matsushita Electric Industrial Company Ltd.
3-15 Yagumo-Nakamachi, Moriguchi
Osaka 570 Japan
Tel: (06) 909-1121, Fax: (06) 906-3851.
>From our visit it was not clear exactly what was Matsushita's basic
interest in the machine; was it only to get their feet wet in the
parallel processing area or to really develop and market a parallel
computer for solving problems? However, Kahaner, has subsequently had an
opportunity to see Nogi's laboratory in Kyoto and discuss ADENA with him
in detail. Nogi claims that some Matsushita staff understand ADENA very
well, as they are involved in not only the hardware but also the
software development. Also, at least two of his former students are now
working on the project at Matsushita. From those visits and examination
of the technical papers the following summary is provided.
At least three versions of a parallel processing computer called ADENA
have been described by Nogi. The first was in 1980. Matsushita's version
appears to be similar to what Nogi calls ADENA II. Basically it is a 256
node processor array that is attached to a host workstation. The
current ADENAs are hosted by a Solbourne workstation via a VME bus.
Sixteen processors fit on one board. The interconnection network is
called a multi-layer crossbar, with maximum data transfer of
5.1Gbytes/second (each processor has about 20Mbytes/second input and
output capability). This network shares one feature of the the hyper-
crossbar network described above (Hitachi) in that communication between
any two processors takes at most two hops, but in other ways is quite
different. Nogi calls it a "skew" network and we describe it in some
detail below.
Each ADENA processor is a custom RISC. ADENA is organized to support
numerical solution of partial differential equations using ADI
(Alternating Direction Implicit) iteration schemes. Peak performance is
2.5GFLOP (per processor peak performance is 10MFLOP), but Nogi feels
that about 1GFLOP is a more reasonable estimate. In fact, he has
benchmarked "real" computational fluid dynamics applications at a few
hundred MFLOPS. A special language ADETRAN, looking a great deal like
Fortran extensions for other multiprocessors, has also been developed.
Solving partial differential equations in three space dimensions and
time has been one of the most important practical problems facing
computational scientists, and is a ferociously active research area.
Typically, integration is done at a discrete set of time points, with
the computation at each time requiring the solution of a three
dimensional potential equation, for which a prototype is
Uxx+Uyy+Uzz = f(x,y,z)
plus associated boundary conditions. The most common approach is to
replace the differential equations with differences resulting in a large
system of linear equations, whose solution u(i,j,k) on a mesh
approximates U at the points (ih,jh,kh), where h is the mesh spacing.
The matrix of the linear system is large and the equations are usually
solved by iteration. In 1955 Peacemann and Rachford described one method
to efficiently perform this iteration which they called ADI, an approach
that is now known as "operator splitting". In simple ADI each iteration
is composed of three sub-parts. First, one treats the Uyy+Uzz terms as
known and solves the discretized equations associated with Uxx="known",
then solves the Uyy="known", etc. (We are ignoring issues of
acceleration to simplify the description.) This approach is potentially
very efficient because at a fixed j and k solving for the numbers
u(1,j,k), u(2,j,k),..., u(n,j,k) is easy; the system is tridiagonal.
Furthermore, for different j and k the tridiagonal systems are
independent and can be solved in parallel as long as all the data are
available to each parallel solver. Solving Uyy="known" also requires the
solution of a set of independent tridiagonal systems, etc. Thus in a
parallel implementation each processor solves one tridiagonal system.
The key point in any parallel implementation is that for efficient
computation it is necessary for data computed in one processor to be
quickly available to one or more of the others; thus between-processor
data communication is a crucial aspect of parallel processing. The
crossbar network is one solution to this problem; every processor is
connected directly to every other, allowing data to be transferred
between any two processors in one unit of time, or "hop". But large
crossbars are expensive and difficult to build; the number of
connections grows as the square of the number of processors. A thrust
in much of today's parallel processing research is to design a
compromise network, one that is not too costly but still efficient. For
example, a two dimensional (torus) mesh network of k**2 processors has
only about 2*k**2 connections, but communication between two processors
can take as many as k hops. Of course, a good algorithm will not require
data from far away processors and thus can be efficient on compromise
networks. QCDPAX and AD1000 use torus networks.
In the ADI example, processor (j,k) which solves the tridiagonal system
for fixed j and k, only needs data from adjacent processors, those
associated with j-1, j+1, k-1, and k+1. But when solving the next set of
equations Uyy="known" the same processor appears to need data from a
processor on the same row, but not adjacent. In the ADENA organization a
set of data from processor (i,j) can be sent to (j,k). What this means
is that when Uyy="known" is to be solved the user can visualize that the
network of processors has "flipped" to allow only adjacent processors to
be accessed.
The actual network consists of 16 planes. On each plane, there are 16
busses of row direction and 16 busses of column direction. A 32 word FIFO
queue is provided for each cross point of these busses. At the end of
the busses Send/Receive Controller elements are provided which can
send/receive group data to/from the addressed FIFO and automatically
synchronize the operations.
The most exciting thing about ADENA is that it is not a hypothetical
machine; it is actually up and running. At Kyoto University, Kahaner
watched the system in action. While he and Nogi were working, several
other "real" users were also accessing the machine from elsewhere on the
campus. Nogi claims that some physicists and engineers in different
departments are doing useful work, primarily CFD. In fact, when Prof.
C.T. Kelley, (North Carolina State University) visited the laboratory a
month earlier he also noted that ADENA seemed to be in use and that "the
computer appeared to be closer to a production model than a prototype."
We also noted, as did Kelley, that the current bottleneck seems to be
communication with the host via the VME bus. Nogi's users are writing
programs in ADETRAN. We looked at some of these programs and they
appeared perfectly straightforward, much more so than the description
above would suggest. Nogi claims that the language is solid and that
there is even a user's manual, unfortunately only in Japanese. He has
already written several fundamental routines, not only ADI but FFT and
some others. He also claimed that it was easy to break up problems that
need more mesh cells than a 16 by 16 grid would provide, but we haven't
looked at that issue in detail.
An interesting thing about ADENA is about its possible commercial
availability in the near future. So far three copies of the machine
have been made. Matsushita recently made a product announcement, but
while we were visiting the lab, we were told that it was a mistake and
had been retracted. ADENA is the result of more than 10 years of
research, and the originator has solid intuition for numerical
techniques. We were told that the 256 processor (2.5 GFLOPS) ADENA will
be sold for about $1Million U.S. It is not really possible to evaluate
such a system without spending considerable time working with it on a
day to day basis, but given its current state we feel that it would be
very appropriate for a outside researcher to spend some time at Kyoto
trying ADENA. Nogi explained that such researchers are welcome (in small
numbers) but that he is very busy.
A number of English language reports are available about ADENA. Two of
the most recent and accessible are as follows.
"Processing Element Design for a Parallel Computer", K Kaneko, M
Nakajima, Y Nakakura, J Nishikawa, I Okabayashi, H Kadota, IEEE Micro,
August 1990, pp26-38.
"ADENA Computer III", T Nogi, Mem. Fac. Eng., Kyoto U, Vol 51, No. 2, 1989,
pp135-152.
MISCELLANEOUS MATSUSHITA ACTIVITIES.
Matsushita is also hard at work on HDTV. They showed us one lab filled
with HDTV related equipment. One experiment involves storing images on
an optical disk (12" diameter) and studying how fast these can be
brought up on the display. Currently they are able to store 600 images
per disk, about 20 seconds worth of imaging. Recording and replay rates
are 18Mbits/per second, much too slow for real time applications unless
sophisticated image compression techniques are used. Video and audio
are stored on the same disk, but at this point the key problems are
still quantity of data, and transmission rates.
We also looked some interesting parallel computers devoted to graphics.
We saw photo-realistic image generation for office or home furnitures,
and hardware and software systems to support real-time, interactive
usages. The Matsushita graphics group has been doing everything, from
hardware to software to application. This is typical of Japanese "don't
give up any part of the technology" approach.
At dinner we had an opportunity for some frank discussions about
Japanese industrial practices, such as the status of women scientists,
and the willingness of Japanese companies to hire Western researchers.
Kung and Kahaner have both noticed the lack of women in research
environments, and their almost total exclusion from more senior
positions. This is related to Japanese custom, as many men still repeat
the adage that "most women like to get married and stay home".
Nevertheless, with a population predicted to peak in absolute terms
early next century, women represent a critical resource in Japanese
society. Both government and industry recognize this and have policies
encouraging women, but we will have to wait to see if any real changes
occur. Concerning Western researchers, it is also quite clear that
Japanese industry is very happy to employ and sponsor these people, at
least on a short term basis. When we asked, though, what chances a
Westerner had, even one who was willing to make a long term commitment
to a Japanese company, of working into a manager position, we were told
"that would be very difficult". Perhaps things are better at Japanese
subsidiaries in the west.
---------------END OF PART 3--------------------------------------------
rick@cs.arizona.edu (Rick Schlichting) (11/06/90)
[Dr. David Kahaner is a numerical analyst visiting Japan for two-years under the auspices of the Office of Naval Research-Far East (ONRFE). The following is the professional opinion of David Kahaner and in no way has the blessing of the US Government or any agency of it. All information is dated and of limited life time. This disclaimer should be noted on ANY attribution.] [Copies of previous reports written by Kahaner can be obtained from host cs.arizona.edu using anonymous FTP.] To: Distribution From: David Kahaner ONRFE [kahaner@xroads.cc.u-tokyo.ac.jp] H.T. Kung CMU [ht.kung@cs.cmu.edu] Re: Aspects of Parallel Computing Research in Japan---Kyushu & Tsukuba Univ., ETL, Sanyo, New Info Proc Technology project. Date: 6 Nov 1990 ABSTRACT. Some aspects of parallel computing research in Japan are analyzed, based on authors' visits to a number of Japanese universities and industrial laboratories in October 1990. This portion of the report deals with parallel computing at Kyushu and Tsukuba Universities, Electrotechnical Laboratory, Sanyo Electric, and the New Information Processing Technology project. PART 4. The following outline describes the topics that are discussed in the various parts of this report. PART 1 OUTLINE------------------------------------------------------------- INTRODUCTION SUMMARY RECOMMENDATIONS PART 2 OUTLINE------------------------------------------------------------- FUJITSU OVERVIEW Company profile and computer R&D activities VP2000 series supercomputer organization and performance PARALLEL PROCESSING ACTIVITIES SP (Logic Simulation Engine) AP1000 (Cellular Array Processor) RP (Routing Processor) ATM (Asynchronous Transfer Mode) Switch MISCELLANEOUS FUJITSU ACTIVITIES Neurocomputing HMET NEC SX-3 series supercomputer organization and performance Benchmark data for SX-3, VP2000, and Cray. Comments MISCELLANEOUS NEC PARALLEL PROCESSING ACTIVITIES PART 3 OUTLINE------------------------------------------------------------ HITACHI CENTRAL RESEARCH LABORATORY HDTV PARALLEL AND VECTOR PROCESSING Hyper crossbar parallel processor, H2P Parallel Inference Machine, PIM/C Josephson-Junctions Molecular Dynamics JAPAN ELECTRONICS SHOW, 1990 HDTV Flat Panel Displays MATSUSHITA ELECTRIC Company profile and computer R&D activities ADENA Parallel Processor MISCELLANEOUS ACTIVITIES HDTV Comments about Japanese industry PART 4 (this part) OUTLINE-------------------------------------------------- KYUSHU UNIVERSITY Profile of Information Science Department Reconfigurable Parallel Processor Superscalar Processor FIFO Vector Processor Comments ELECTROTECHNICAL LABORATORY Sigma-1 Dataflow Computer and EM-4 Dataflow Comments CODA Multiprocessor NEW INFORMATION PROCESSING TECHNOLOGY Summary Comments UNIVERSITY OF TSUKUBA PAX SANYO ELECTRIC Company profile and computer R&D activities HDTV END OF OUTLINE----------------------------------------------------------- KYUSHU UNIVERSITY. Kyushu University is in the city of Fukuoka, the largest city on the island of Kyushu, Japan's southernmost large island. Kyushu is the closest part of Japan to mainland Asia (Korea) and was the route for Genghis Khan's unsuccessful invasion attempt in the 13th century. His fleet was destroyed by a storm, dubbed heavenly wind or kamikazi. Fukuoka is about an hour and a quarter by air from Tokyo. Our host for this visit was Prof. Shinji Tomita Department of Information Systems Interdisciplinary Graduate School of Engineering Sciences Kyushu University 6-1 Kasuga-Koen, Kasuga-shi, Fukuoka 816 Japan Tel: (92) 573-9611 Ext. 411 Email: tomita@is.kyushu-u.ac.jp Professor Tomita was with Kyoto university, where Kung first met him in a 1982 visit to Japan sponsored by IBM. Tomita explained to us that the Information Science Department is composed of seven labs, Information Recognition, Information Transmission, Information Organization, Computational Linguistics, Information Models, Information Retrieval, and Device Physics. These labs are also associated with the engineering, math and physics departments. (By lab, we mean a professor and his associated research assistants and students.) Tomita's lab is Information Organization. We spent most of our time hearing about its activities, which are described briefly below. (1) Reconfigurable parallel processor. The effort here is to develop a testbed for parallel computer architecture, operating systems and parallel programming languages research. The hardware system consists of processing elements (PEs) and a crossbar network that can be reconfigured to fit the communication patterns of different applications. Consisting of the SPARC processor, a home-made MMU and the Weitek floating-point chips, the PE is a complete processor supporting virtual memory and cache. Each PE has a peak performance of 10 MIPS and 1.6 MFLOPS, and has a 8 MBytes of local memory. The system is intended to support all sorts of usage models including tightly coupled (shared memory) computation models and loosely coupled (distributed memory) computation models. A thrust of this effort is therefore in the operating systems area. They are planning to build a 128 by 128 crossbar network, supporting both static and dynamic routing. The system clock is a modest 16.6 MHz. The 128 by 128 crossbar will need 32 15"x20" boards. Currently they have built a subset of the crossbar. Hardware construction is limited by available funds, and the 128-processor system will take three years to complete. The following reference gives more details. "The Kyushu University Reconfigurable Parallel Processor-Design Philosophy and Architecture", Info. Proc. 89, Proc of IFIP 11th World Computer Congress, San Francisco USA (Aug 1989), G.X. Ritter (ed), Elsevier Science Publishers B.V. (North Holland), pp 995-1000. (2) Superscalar processor. In this kind of a machine the instruction word is often quite long and can contain several instructions that can be decoded and executed in parallel by multiple instruction pipelines. Performance gains in such a system are crucially dependent on the run-time method of resolving data dependencies and control dependencies and on the capabilities of the compiler. Thus there is symbiosis between hardware and software support. This research project is thus studying architecture and also compiler development. The hardware supports four simultaneous instruction issues, and eager execution of predicted program branches, and shadow registers to recover when branch prediction is incorrect. (3) A vector processor based on streaming/FIFO architecture. The goal of this project is to do something different from conventional vector supercomputers, which use vector registers to feed the arithmetic pipes. The researchers here propose to use a set of FIFOs instead of vector registers. Since the FIFOs can be made much larger than registers, the proposed approach has some potential advantages of sustaining much higher throughput arithmetic pipes by using chaining. However, to make chaining easy, virtual ALU and load/store pipelines are needed. So this is a project involving very challenging issues and with real-world implication. The researchers promise a "blueprint" of the architecture by April 1991. (4) Special purpose machine for high-speed ray tracing. This project studies parallellism available at different processing levels of a ray tracing computation. Kyushu is one of a few Japanese universities where research is addressing mainstream computer systems issues. In the U.S., there are probably no more than ten universities which are able to do similar kinds of research. Professor Tomita and his two junior project members all have systems building experiences. One, Dr. Akira Fukuda, a graduate of Kyoto University, worked at NTT, and the other worked three years on mainframes at Fujitsu. We believe that this kind of industrial expertise is unusual at Japanese universities. The faculty members and Ph.D students we talked to seemed capable. However, these projects have ambitious goals, and their resources are limited. The entire group, including undergraduates, is about 20 people, and funds are also very tight. It is hard to predict if the four systems or even any one of them will be sufficiently finished in time to support the planned research. But even if their research goals are not completely accomplished, they will have learned valuable experiences for real systems of the future. We also had the opportunity to meet Professor Masaaki Shimasaki, who has recently moved to Kyushu U. from the Computer Center of Kyoto University. Prof Masaaki Shimasaki Computer Center, Kyushu University Fukuoka 812 Japan Tel: (092) 641-1101, ext 2507, Fax: (092) 631-3196 Email: simasaki@sun4.cc.kyushu-u.ac.jp In the past Professor Shimasaki worked on finite element for various kinds of mixed boundary value problems. More recently he has been studying performance analysis of vector supercomputers and techniques used in vectorizing and parallelizing compilers. In particular he has applied Hockney's model to NEC SX-2 and Fujitsu Facom VP-400 supercomputers. (Hockney proposes that an estimate of the total time for a vector operation, t, can be given by t=(n+nhalf)/rinf, where n is the vector length, rinf is the peak speed, and nhalf is the vector length at which half the maximum speed is obtained). Shimasaki's results match observed data extremely well. He is going to apply this technique to newer systems and we will be anxious to see the results. ELECTROTECHNICAL LABORATORY. Kahaner wrote about ETL, see 2 July 1990 file "etl", so here we summarize only our latest impressions based on Kung's recent visit to ETL. The main interest in this visit was the Sigma-1 Dataflow computer and its follow on the EM-4. To review, Sigma-1 now has an operational 128-PE system, in 32 clusters each composed of 4 processors. A single processor can compute at 3.3 MFLOPS (32 bit arithmetic) and 5 MIPS. Each processor requires two boards, one for the processor and one for memory. Connections between processors and clusters are each 100 MBytes/second. Applications developed on this machine have not been very significant yet. They demonstrated a trapezoidal integration of sin(x) with 30K mesh points, for which the calculation rate is 170 MFLOPS. It might be interesting to try an adaptive integration which could exhibit the run-time capability of a dataflow architecture. They said that they would try this. ETL researchers claim that Sigma-1 is the first and likely the last pure dataflow machine. The follow up project, EM-4, suggests that traditional optimization techniques are being used to improve performance of dataflow architectures. (We saw a similar effort at Kyushu University.) The new aspects of these dataflow machines are not much different from those of any advanced high-performance machines. It is very clear that distinguishing data flow architectures is no longer an interesting issue. However, Japanese researchers working in the area are making every effort to emphasize that they are still working on dataflow architectures. It is worthwhile to repeat some of the essential issues here. Every calculation can be thought of as being described by a set of tasks. Some tasks can be done in parallel, others sequentially. Most tasks need data that will be computed in another task. Tasks may be large, such as a subroutine, or as small as an arithmetic assignment statement. It is relatively easy to generate large tasks, but then the amount of parallelism is limited. A task graph (or dataflow graph) indicates which tasks need to be done first, how much time each takes, where data goes, etc. In principle, using this graph one can determine the absolute lower bound on the execution time for the problem. The important problem for any parallel processor is to allocate a set of tasks having different execution times and precedence constraints onto a number of processors. In practice, tasks cannot be matched perfectly to processors, and there are overhead and other delays. Further the execution time for large tasks depends on how their subtasks are broken up. Thus the actual execution time will always be greater than the lower bound. In "real" dataflow, the tasks are low level. If a dataflow computer can organize processors to execute tasks exactly as they are presented in the task graph, the possibility exists for a computation to be done in almost the minimum possible time. The difficulty with pure dataflow computers has been that various overheads have been tremendous, these include difficulty of controlling the sequence of execution, memory overhead because of contention for data, and communication overhead. There is a great deal of dataflow work going on both in Japan and in the west. But as we have pointed out above current research seems to involve compromising the pure dataflow concept to bring it back to practical realization. The EM-4 project is one example; another is the Harray project at Waseda university in which large tasks are done using more conventional control flow and within these tasks computations are done using data flow. The problem of allocating processors to tasks has been studied for many years and is known to be a very intractable scheduling problem, known as strong NP-hard. Thus various approximate algorithms are used. One of these has been shown to be near optimal by H. Kasahara, also of Waseda University. Kung was given a briefing on the ETL's CODA multiprocessor project. The goal of the project is to study scalable prioritized multi-stage networks which have a predictable delay for communication. These kinds of networks are important for sensor fusion in real-time applications such as process control. A novel idea of "priority forwarding" is proposed so that the part of a packet that contains its priority information will never be blocked. This will guarantee predictable communication delay for packets with the highestest priority. Our overall host for this visit to ETL was: Toshio Shimada Chief Scientist Computer Architecture Section Computer Science Division Electrotechnical Laboratroy 1-1-4 Umezono Tsukuba, Ibaraki 305 Tel: 0298-54-5443 FAX: 0298-58-5882 Email: shimada@etl.go.jp NEW INFORMATION PROCESSING TECHNOLOGY. This is the follow-on to MITI's Future Information Technology Project which began in 1986. Some parts ended this year, others end in 1992. The New Information Processing Technology is MITI's New Initiative in 1990's. Kahaner reported on aspects of this earlier, see 3 July 1990 file "highspd", and 26 June 1990 "nipt". Recent additional information was provided by Mr. T. Yuba of ETL. The best information we have is that this new follow-on MITI project is still not officially decided. For the past two years specialists from the Japanese government, academic, and industrial organizations in fields such as mathematics, physiology, psychology, and computer science have organized three subcommittees and six working groups in order to make a comprehensive study to define and set project goals. The working groups meet about once a month and have produced many preliminary reports. A final report is due soon. The new project deals with the following fundamental issues. (1) The capabilities of traditional (Turing) computers have increased dramatically, but there are still many kinds of information processing that are easy for living organisms for which conventional computers perform poorly. (2) In the latter areas, work of the "fifth generation project" has focused on inference, language, understanding and other logical processing. (3) Other areas such as pattern recognition, intuitive information processing, and autonomous and cooperative control involving systems having many degrees of freedom, seem to be less suitable to sequential processing. (4) Physiology, cognitive psychology, and other brain research have produced a great deal of insight into how the brain learns and processes information. (5) Technology such as optical and molecular devices are being developed that may make possible large scale parallel processing. While not yet officially set, the project will probably focus on the following two kinds of research. (1) Basic principles of very highly parallel and highly distributed information processing, learning, optical technology and other new devices. (2) Three dimensional information, visual and auditory recognition and understanding, and autonomous and cooperative functions as seen in living organisms. Thus there will be research on something related to "soft logic" supported by massively parallel processors. The goal is to handle ambiguous or incomplete information using a new set of information processing methods. These include, but is not limited to neural nets, and also includes the idea of intelligent databases. The project will probably be of the same scale as the 5th Generation Computer Project, and follow the same organization and setting as ICOT. The project planners have expressed a strong interest in international cooperation. One exciting possibility discussed by Kung is to establish a research facility containing some massively parallel hardware of at least 1 million programmable processors. This can be an international testbed for applications in massively parallel processing. Contact on this subject is: Mr. Toshitsugu Yuba Director Intelligent Systems Division Electrotechnical Laboratory 1-1-4 Umezono Tsukuba, Ibaraki 305 Tel: (0298) 54-5412 A project to build a reliable computer with a million or more processors is the kind of basic research thrust that a great nation could feel very proud about embarking on. There would be difficult problems in designing and building it. But the challenges and the opportunities would draw the best research minds like a powerful magnet. It is impossible to say what will really come out of this but every scientist should be excited about the possibilities. UNIVERSITY OF TSUKUBA. Kung made a short visit to University of Tsukuba after his visit to ETL. The purpose of this visit is to see the 14 GFLOPS, 488-processor MIMD, QCDPAX machine. The machine was designed by University of Tsukuba and manufactured by Anritsu Corporation. Kahaner had a report on this machine before, see April 12, 1990 "pax". The machine has started to produce interesting results in physics. One paper reporting these results has just been presented in a recent physics conference in the U.S. According to Professor Hoshino, the next generation machine will be 100 GFLOPS and will probably be built by physicists. It is quite an achievement to have built a machine of this scale by any standard. This project is an interesting and successful collaboration example between physicists and computer scientists. Contacts are: Professor Tsutomu Hoshino Institute of Engineering Mechanics University of Tsukuba Tshukuba-Shi, Ibarari-Ken Tel: (0298) 53-5255 FAX: (0298) 53-5207 Email: hoshino@kz.tsukuba.ac.jp Professor Yoshio Oyanagi Institute of Information Sciences Unversity of Tsukuba Tennodai 1-1-1, Tsukuba 305 Tel: +81 298-53-5518 FAX: +81 298-53-5206 Email: oranagi@is.tsukuba.ac.jp SANYO ELECTRIC CO. We had a brief visit in Sanyo's Osaka R&D facility to discuss the possibility of using the CMU-Intel iWarp in HDTV applications. We were given a briefing on Sanyo's research activities. Our host for this visit was Mr. Yasuhiro Ishii Senior Manager Sanyo Electric Co. Ltd Information & Communication Systems Research Center Optoelectronics Dept. 180 Ohmori, Anpachi-Cho Anpachi-Gun, Gifu, Japan Tel: (0584) 64-3996, Fax: (0584) 64-4754. Sanyo is primarily a consumer products corporation but they have also made significant advances in amorphous silicon and are very proud of their research in amorphous silicon solar cells. The R&D organization works with a budget of about $500Million U.S. divided roughly as follows. R&D Administrative Hq. Tsukuba Research Center 100 people (Basic research) Functional Materials Res. Center 200 (Fundamental res.) Semiconductor Res. Center 200 " ULSI Research Center 200 " Control and Systems Res. Center 200 " Product Engineering Laboratory 200 (Applied research) Audio-Video Research Center 200 " Information and Communication System Research Center 200 " The research staff we met were associated with the last three groups. Most of the work is centered in Osaka, except for the basic research in Tsukuba for which the most interesting computer applications there have to do with intelligent systems, such as robots, neurocomputers, and biocomputers, and the Information and Communication Center that is in Nagoya. The latter works on parallel processing for display and image processing, AI, expert systems, natural language processing, optical disks, digital communications, and research in reliability for functional and electromechanical components. Our comments here are not about research in general but only about the specific interactions we had. The HDTV research group we met were quite different from approximately similar groups that we visited in that the scientists (and managers) did not speak much English. We were accompanied by Mr. T. W. Kang of Intel Japan who provided a translation into Japanese, and this was absolutely necessary. The major interest here was how to compress HDTV images in order to write them on a CD-ROM. This is the same problem that was raised at Hitachi and Matsushita. Much better compression algorithms are needed. Sanyo is hoping for compression ratios of 150 times. This is an ideal application for parallel processing. It currently takes about eight hours to compress an image, and of course Sanyo would like to do it in real time to prepare for future writeable CD technology. There are about 1.7 TeraFLOPS computations. Only parallel machines can deal with this in any practical way. Special-purpose parallel hardware cannot really do the job because of lack of the flexibility needed to implement high-quality compression algorithms. New programmable parallel systems such as iWarp can potentially provide the required power and flexibility. ---------------END OF PART 4----------------------------------------------- ---------------END OF REPORT-----------------------------------------------