K538911@CZHRZU1A.BITNET (Patrik Eschle) (06/17/87)
(This letter was also sent to info-alliant, so you may get it twice.)
This April we visited Apollo Frankfurt to run benchmarks on an
Alliant FX/8, which will (hopefully!) be the new computer of the
physics institute of the University of Zuerich (you may remember
our postings some month ago about the DEC-Lobby fighting the
project). The Alliant FX/8 is known in the Apollo world as DSP9000.
Please send all comments to Patrik Eschle (address at the end of
this letter)
----------------------------------------------------------------
Performance Test of an Alliant FX/8
===================================
Contents: 1. Introduction
2. Diagonalisation of large matrices
3. GEANT Simulation
4. Data Analysis directly from magnetic tape
5. Computation of Fractal Dimensions
6. Operating System and Documentation
7. Conclusions
Acknowledgements
G. Broggi, M. Doser, S. Eichenberger, P. Eschle, S. Poole,
W. Reichart, U. Straumann, D. Vermeulen, S. Vogel, Physics
Institute, University of Zurich
April 24th, 1987
1. Introduction
During the evaluation process for our institute's new main
computer we received an invitation from Apollo Computer
(the representative of Alliant in Europe) to test such a
machine at the company's headquarters in Frankfurt.
Our goals were to simulate as closely as possible the
expected typical working situation and to see how far we
could get in porting 4 programs from various environments to
the Alliant, in two days time. The selected applications
cover the whole range of computational problems of our
institute.
The following 4 sections describe briefly each of them and
the progress we made. Eventually, all the applications ran
successfully.
The last section reports some impressions obtained by
studying the operating system and the documentation in more
detail.
We were working simultaneously in 5 groups on 5 Apollo
workstations running VT100 emulators, on a system configured
as described in section 6.1 of this report. Dr. Butscher
and Dr. Fruehwein of Apollo Computer were assisting us. We
were nontheless consulting the manuals extensively.
About half of us had already had some experience with UNIX,
while the others were used to various other operating
systems like VMS, VM/CMS, TSO etc. Comments on the UNIX
implementation are found in section 6.
2. Diagonalisation of large matrices
The program TETRA is used to calculate the polarization of a
muon sitting at the tetrahedral site in a copper lattice
immerged in a magnetic field. It was developed to
investigate the "level crossing resonance". Essentially, it
diagonalises a large matrix (512x512 Hermitian), i.e. it
computes all the eigenvalues and the corresponding
eigenvectors, in order to build the so called "W-matrix "
(512x512 real symmetric), that contains all the relevant
information on the time dependence of the muon polarization.
It achieves this by using a package of routines called
"EISPACK ". Typical problems with this progr am are the
computational power and memory consumption. On the CRAY 1S
in Lausanne it needed about 1 cpu minute and about 900
Kwords (8 bytes each) main memory. Further, to keep the W-
matrix for subsequent investigation, another 1 MByte of
secondary storage was necessary.
To port this program to the Alliant FX/8 in Frankfurt was
easy, as far as the runnability was concerned. After about 3
hours of work the first, non-optimized version was running
and produced correct results. Included in this time is
becoming acquainted with the operating system, the editor
and the compiler. In getting the program running the
following problems occurred:
- Since the program was developed on a CRAY with a natural
word size of 8 bytes, corrections in declaration of arrays
and variables were necessary. The implicit declaration
mechanism in FORTRAN gave some difficulties!
- Another feature of the CRAY FORTRAN compiler is the
treatment of subroutine/function and common names. In CRAY
FORTRAN a common block and a subroutine or function may have
the same name. This is not allowed in standard FORTRAN 77,
nor in many expanded dialects, as for instance VAX FORTRAN.
- The most serious problem was the fact that the name SYSTEM
is reserved in CONCENTRIX. A common block with this name in
TETRA led to abortion
of the program with a "Bus Error " message. This would
have meant some hours of search, if the Apollo people had
not recognized the problem immediatlely.
The execution of the first version of TETRA took about 1
hour of cpu time. An analysis of the running times showed
that most of the time was used in the EISPACK routines. In a
second phase we allowed the compiler to produce vector-,
concurrent and vector-concurrent-code. The benchmarks
are presented at the end of this section.
It is important to note that these speed improvements were
achieved without changing one line of FORTRAN source but
only by switching compiler options. The results of runs
with enabled and disabled options were compared to detect
possible optimization errors.
In a next step we introduced compiler directives into the
FORTRAN source to allow the compiler to ignore apparent
obstacles to optimization that it could not recognize as
irrelevant. This of course needs an understanding of the
FORTRAN source in order to perceive the problems of the
compiler and to ascertain that ignoring the obstacle will
not change the results. As a further step we even tried to
clear away actual obstacles by changing the FORTRAN source.
This is obviously only meaningful for the most time
intensive routines. The compiler aids in this process by
explaining why it is not able to optimize a loop, and what
directives would cause it to ignore apparent problems..sk 4;
This was successful mainly for the routines HTRID3 and
HTRIB3, as far as speed is concerned. A sore point is,
however, that the output files of the latter runs were not
the same as the former ones on a byte by byte comparison.
The results anyway agreed within the expected round-off
errors.
For a comparison we executed TETRA on a VAX 8650 running
VMS. After removing some obstacles of bureaucratic nature
(such as getting enough page file quota to link the
program!) it ran successfully. The size of the working set
was left at the default on this machine (this was done to
avoid creating a special benchmarking environment). TETRA
was executed on a NAS AS/XL V60, aswell. There it ran in
about 6 min of cpu time, independant of vector or scalar
mode, resumably due to large strides that cause heavy
paging. The same phenomenon may
cause the speed decrease in vector modes compared to non-
vector modes (g -> gv and gc -> gvc ) on the Alliant .
Benchmark Results:
Routine Time[s]
g gv gc gvc opt VAX 8650
DEFSYS 0 0 0 0 0 0
HAM 1 0 1 0 1 1
HTRID3 1008 1217 769 804 225 1343 (EISPACK)
TQL2 743 237 342 88 95 847 (EISPACK)
HTRIB3 1431 1521 526 542 334 2064 (EISPACK)
COMPW 637 415 185 184 183 985
SAVEW 10 5 6 7 6 1
Total 3830 3395 1829 1625 844 5241
[g] : global optimisation
[v] : vector-code
[c] : concurrent-code
[opt] : hand optimisation
[VAX] : working set size: 2000 pages, program size: 18000 pages
3. GEANT Simulation
GEANT is a program package which can be used to simulate
electromagnetic showers. By means of it, we have implemented
a simulation of the SINDRUM II (The SINDRUM II is an
experiment which looks for rare decays at the medium-energy
particle accelerator of the Swiss Institute for Nuclear
Research (SIN)) setup. The CERNLIB and ZEBRA routines called
by GEANT were locally available. The source (roughly 50000
FORTRAN lines ) was copied from tape in standard VAX format
without any problems. The compilation was successful without
modifying the code except for tab characters (Cntrl I) which
are not accepted by the compiler. The only problem that
occurred at run time was an incompatibility between the VAX
and the FX/8 compiler involving incomplete argument lists in
function or subroutine calls. The FX/8 compiler is not able
to handle non matching argument lists whereas the VAX
compiler can. The results of the test runs with various
compiler switches (g = global optimization, c = concurrency)
are shown in the table. The CPU time
consumption on the Venus (VAX 8650) at SIN was 85 ms per
event. In parallel processing the CPU time consumption
decreases linearly with the number of CE's involved.
GEANT runs with several compiler options:
run-number remarks time/event
1 no optimization 240 ms
2 optimized compilation on one CE 140 ms
3 optimized and concurrent on one CE 140 ms
4 4 jobs parallel on 4 CE's 35 ms
Conclusions and remarks:
- Fortran programs written under VMS are easily portable to
the FX/8.
- Simulation of single particle histories can economically
be processed simply using the appropriate compiler switches.
- The source code debugger dbx is easy to use and is
equivalent in its characteristics to the VMS debugger.
4. Data Analysis directly from magnetic tape
One of the reasons for the purchase of an Alliant for the
physics institute was to be able to analyse High Energy
Physics (HEP) data which are written on magnetic tape in a
non-standard format. The possibility of reading/writing an
arbitrary number of records from/to an arbitrary file on a
tape was tested. The tape used for this test was a 1600
bpi, non ANSI labeled tape consisting of 3 files of about
100 records, each record having a length of 12960 words
(25920 bytes).
Since it was not possible to use the Alliant Fortran tape
handling routines TOPEN, TREAD, TWRITE... (the input buffer
- being of type character - is limited to a record
size of 2000 characters (8000 bytes)), the tape unit was
treated like an ordinary file. An example of the code used
is :
integer*2 area(12960)
open(unit=17,file='/dev/xmt00m',form='unformatted',recl=25920,
recordtype='fixed')
read(17) area
or
write(17) area
With this code, it was possible to successfully read several
consecutive records from any of the three files on tape and
copy these records to a file on disk. Similarly, it was
possible to read the records from the disk file and to write
them to a file on tape. All files successfully passed a data
integrity check. Tape positioning (file and record skipping)
was done with C-shell commands.
Conclusion/comments:
- the feasability of handling non-standard tapes on the
Alliant was shown.
- timing tests were performed for an average computer load
(10 interactive sessions, 5 batch jobs). The (real) time
needed to read in one 12960 words record was of the order of
1 s. This is satisfactory for applications where the
computation time is large in comparison to the I/O time.
- We were not able to find any Fortran tape positioning
routines. The available C-shell tape handling commands
(rl,mt) however worked very well, so that the corresponding
system calls must be available.
5. Computation of Fractal Dimensions
The algorithm calculates the fractal dimension of an
experimental set, reconstructed in an "Embedding " space,
by using the "Fixed Mass " method. (See: R. Badii and G.
Broggi, Physik-Institut der Universitaeat Zuerich,
CAP Software Report Nr. 6, Mai 1987, and references
therein.) The input data points are signed integers of 5
digits, obtained through an ADC from a fluid-dynamics
experiment. Most of the data processing consists of integer
arithmetics.
The code was written and optimised (safeguarding readability
and ease of use on different experimental systems) on a DEC
uVAX II.
The program was tested on a Cray 1-S, yielding a poor
performance, which can be explained by the absence on that
machine of an integer processor and by the presence of very
short inner DO-LOOPS. These, when vectorisation is enabled,
cause so much overhead that a better performance is obtained
in scalar mode (see table).
At this point it was already clear that the program did not
lend itself to a simple adaptation to a vector machine, but
that, on the contrary, it needed a complete re-design.
Nevertheless, it was decided to run it on the Alliant in
order to test the efficiency of integer operations, the
amount of overhead created by vectorisation and the
advantages and disadvantages of concurrent execution on more
CEs.
With reference to the table, the results can be summarized
in the following way:
- Vectorisation is in this case clearly disadvantageous,
introducing an overhead which is exalted by concurrent
operation of more CEs.
- The concurrent execution of parts of the code succeeds in
speeding up some of the loops, but, again due to the
overhead introduced, the global result is still slightly
worse that in scalar mode. (A limited modification of the
program actually allowed the concurrent execution of longer
parts of code, showing clearly that, when a more complete
redesign is not possible, this is a way which is worthwhile
following.)
- A purely scalar execution on a single CE yields an
execution time which is a factor three longer than that on a
VAX 8650, and the same factor shorter than on a uVAX II. It
should nevertheless be noticed that, in the configuration
tested, three quarters of the machine's computational power
were left free for the other users, while the program was
being executed in this mode.
In conclusion, one should expect programs which do not
perform well on a Cray to exhibit the same problems whilst
in vector mode on an Alliant, and vice versa. The results
obtained by means of concurrent execution are more difficult
to predict, and can be improved by reasonably small
modifications of the code. The execution time of integer
operations is not very brilliant, probably due to intrinsic
aspects of the architecture, but is nevertheless comparable
to that of a VAX of the 8000 series.
Performance of the program: 100 reference points,
28000 data points.
Company Computer Mode Ex. Time
Digital Equipment uVAX II scalar 303 s
Digital Equipment VAX 782 scalar 202 s
Digital Equipment VAX 8650 scalar 36 s
Cray Research CRAY 1-S vector 36 s
Cray Research CRAY 1-S scalar 19 s
Alliant Domain 4 CEs vector-concurrent 205 s
Alliant Domain 4 CEs vector 139 s
Alliant Domain 4 CEs concurrent 106 s
Alliant Domain 4 CEs scalar 1 CE 103 s
6. Operating System and Documentation
6.1 Configuration and behavior of the machine
- The configuration was: 4 CE, 3 IP, 24 MB Core Memory,
2 x 378 MB Disk (ca. 30 MB
available free space),
Magnetic Tape, Line Printer
5 Apollo workstations connected
through Ethernet,
1 system console terminal
- The typical load during the tests was 12 users (multiple
logins) with 19 processes. There was no noticeable
degradation of interactive performance, except for problems
with the vt100 emulator on some of the workstations.
- A heavy load of 13 processes, each filling an array of 8
MByte, that first filled up core memory and then the paging
area on the disk, caused a slow down of the machine due to
the high paging rate, but did not crash it.
6.2 The users point of view
- The operating system CONCENTRIX is a port of UNIX BSD4.2.
(Most of the utilities were present in the current version,
exceptions are mentioned below)
- The possible user interfaces (shells) are the C-shell,
Bourne-shell and the VMS-shell, which has been announced for
the next version of CONCENTRIX. (All of us used the C-shell
without problems.
Command line editing is possible, even if not as comfortable
as with VMS)
- Generally, those users unfamiliar with the C-shell found
it fairly easy to become familiar with . Especially
pipelining, input and output redirection and easy
foreground-background operation were considered very useful
features.
- Online help is possible with the info-command or the UNIX
man(ual) pages. The manual pages were not implemented, and
info only had help on the emacs-editor.
- The manuals we were able to access in their final form
(some were only available as preliminary versions) made a
good impression for what concerns contents and typesetting.
Some problems remain unsolved. The binding of the manuals is
poor; besides it being mechanically bad, one cannot update
the manuals. Some of them had no index and we discovered
some inconsistencies and errors.
- For editing we used CCL-emacs. The EDT emulation was not
installed. (As installed, CCL-emacs has a very annoying way
of updating a VT100 screen when scrolling)
- Most of the "standard " BSD 4.2 utilities were available
on the tested system. Notable exceptions were:
-- apropos (We hope this will be installed
together with man)
-- man (Mentioned above)
-- quota (Status not clear, the quota command and the
corresponding system-manager commands quotaon and
quotacheck were present on the system, but it was
not possible to check if they actually worked)
- CONCENTRIX offers a source level debugger (for FORTRAN)
and a timing utility lprof, that displays the consumed time
for each line of source (lprof crashes with a divide by 0
error if the time consumed is less then 1 timing unit...)
- Languages: FORTRAN, C and Pascal were installed. Comments
on the FORTRAN compiler are found in previous sections.
Pascal is very poorly documented (about 50 pages) and does
not support vectorization and parallelism. (We don't
understand why. It's easy for alliant to say that no one uses
Pascal - no wonder with this implementation!). We did not
manage to execute (i.e. force execution on one of the
various computing resources available) a Pascal-program using
floating point numbers on an IP (cause: Emulator trap EMT).
For the C-Language there was only a preliminary version of
the manual available. A test program could not find the math
lib). Finally a program was able to fork into several children
that ran in parallel on several CEs.
We tried to call FORTRAN subroutines from within Pascal
and C. The linker could not find the FORTRAN libraries (and
vice versa did not find the Pascal library when calling
Pascal from FORTRAN). The Pascal manual neither describes the
calling sequences for calls to FORTRAN, nor are there
any Pascal-libraries acting as interfaces to FORTRAN-libraries.
- A batch processing system has been announced for Version
3.0. (A batch system in UNIX is not as important as in VMS,
since UNIX allows to run jobs in the background, schedule them
to run at any time and change priorities interactively)
- We did not find any documentation on error messages.
6.3 The system administrators point of view
We were not able to try any of the system-administrator
commands on-line, nor did we boot the machine (it refused to
crash). We studied the preliminary version of the system-
administrator manual and the documentation from an
introductory course for system-administrators.
- No special tools are available for user-administration.
Insertion and deletion of users etc. is done in the standard
UNIX-way: editing the password file.
- The mon command allows a detailed monitoring of the load
of the system.
- The system can be tailored at boot time to meet the
specific needs of an installation (e.g. size of the
computational complex). The manuals describe this in detail.
7.Conclusions
We conclude that the Alliant FX/8 is very well suited to
cover both the general needs of a small physics institute
and the requirements of advanced applications, ranging from
numerical analysis to high energy physics. It gives very
convenient tools for program development in FORTRAN, while
other languages are not particularly well supported. Several
mathematical libraries and a limited but growing number of
applications are available. In comparison to more
traditional solutions, it has a much lower cost to
computational power ratio. The interesting architecture
which combines parallel computing and vector features
introduces new conceptions of data processing into the
university and scientific environments.
Acknowledgements
================
We would like to thank the staff members of Apollo Computer,
who kindly assisted us and explained some interesting
details of the implementation.
-------------
Alliant, Concentrix are Trademarks of Alliant Computer Systems
Corporation
Apollo is a Trademark of Apollo Computer Inc.
UNIX is a Trademark of Bell Laboratories
CCA EMACS is a Trademark of Computer Corporation of America
VAX, VMS, VT100 are Trademarks of Digital Equipment Corporation
VM/CMS, TSO, MVS are Trademarks of IBM
---------------------------------------------------------------
Patrik Eschle
E-Mail : K538911@CZHRZU1A.BITNET
Private : Kronwiesenstr. 82, CH-8051 Zuerich (Switzerland)
Phone : 1-40 72 39
Institute : Physikinstitut der Universitaet Zuerich
Schoenberggasse 9, CH-8001 Zuerich
Phone : 1-257 29 44
---------------------------------------------------------------