[net.unix-wizards] Celerity evaluation

jon@cit-vax (Jonathan P. Leech) (04/21/85)

    I took advantage of the Celerity demonstration mentioned last week
and  brought  up  mined,  a  full-screen  editor  written  at  Caltech
consisting  of	roughly  20,000  lines	of  C.	  The	only   problem
encountered  was  #including  a  file  needed  by  <sys/proc.h>  which
required adding a -D flag to the compilation.	The  editor  seems  to
operate perfectly.

    In terms of the machine's performance, the	following  compilation
times and image sizes may be of  interest  (compared  to  machines  at
Caltech).

Machine 	Compilation time    Program Size
		user  system total  text    data    bss     total(dec)

Sun 2/4.2 BSD	2404 + 1612  4016   268288  38912   11640   318840
Celerity	1489 +	513  2002   544768  40960   27888   613616
VAX 780/4.2 BSD 1268 +	196  1464   221184  35840   28408   285432

    I find the claimed 2x780 performance unlikely  in  view  of  these
results (admittedly measuring only  compilation  speed).   Also,  this
machine seems to be a RISC, judging from the size of code produced and
a cursory look at the output of cc -S; alternatively, the -O switch to
cc invokes a DE-optimizer. Does anyone know for sure?

    Thanks to RIACS for the chance to evaluate the machine.

    Jon Leech
    jon@cit-vax.arpa
    __@/

dave@RIACS.ARPA (Dave Gehrt) (04/21/85)

I had a discussion with the folks at Celerity on Friday, and was
supposed to add to the motd that the system is currently using a couple
of 120 MB 5-1/4 inch winchesters for disks.  They are considering
adding an eagle to the system, but I am not sure when that may happen
if at all.  I don't have any idea about the actual performance on those
disks, but they say the difference in seek time is 30 ms vice 18 for an
eagle.  The transfer rate would be higher also on a multibus disk.  If
that disk swap happens I'll let you know.

Also, I have a recollection that they also said something about a
speeded up version of the C compiler being in the works.  The current
version is a pcc based version.

Thank you for taking the time to reply.

dave
----------

hammond@petrus.UUCP (04/23/85)

I have done a fair amount of simple benchmarks on a Celerity C1200,
Pyramid 90x, Vax 780, and Vax 785, to compare performance of the CPUs.
The machines all had optional floating point accelerators, the Pyramid
also had a data cache option.  The basic results:

For double precision floating point in C (using register double variables,
which the 4.2 BSD and Pyramid appear to equate to double variables), I
can confirm that the Celerity C1200 appears to be 2 times an 11/780 w/FPA.
That makes it the fastest floating point of the 4 types tested.

I also, at least on the trivial integer benchmarks we tested, can say that
the basic CPU for integer aritmetic appears to be about 3 times an 11/780
or roughly the same as a Pyramid 90x.

Disk Performance: Although my trivial benchmarks took almost the same amount
of CPU (using their new, faster cc) as the Pyramid, they took 3 times as
long in real time.  Our Pyramid has eagles, the Celerity had the slower
120Mb disks.  I don't know what improvement an eagle would make.

Flies in the ointment: The Celerity is a Fortran machine, it has a stack
register array (I'd call it a cache, but caches in my view empty/fill
automagically and this doesn't) of 16 levels.  If your code makes procedure
calls which nest to a depth of greater than 16, then the OS has to copy the
registers to main memory.  This is VERY expensive in CPU time.
Our test of Ackerman's function died after CPU times of 6.3 user, 107.5 sys
(to do all those copies of the stack registers). It died because of a
second flaw: the stack can only grow to a depth of 128K (about 1024 calls deep)
by default.  You can (at compile time) tell the system to allocate
more stack space. I have not yet received an explanation of why they did this
behaviour change from standard BSD, if there is a good reason, we could
probably live with it, since few (other than Ackermann's) procedures get
all that deep.  However, the stack register array filling/unfilling is a
more immediate concern, since it is quite expensive in CPU resources and
it does happen.  We noted that the C compiler rolled up fair amounts of
system time (several times a Pyramid 90x), probably for stack growth.

Another problem we noted was that the system calls we tried measuring
( some of those common to Sys V and 4.2 BSD) were on the average 20% slower
than an 11/780, despite having a (by our tests) 3 times faster CPU.
We are still trying to find out what was going on.  My suspicion
is the loading/unloading of the stack register set for context saves.

If Celerity fixes the stack growth to be less painful, it is a
very interesting machine for number crunching.

Rich Hammond	{allegra | decvax | ucbvax} bellcore!hammond

hammond@petrus.UUCP (04/23/85)

> ...
> Disk Performance: Although my trivial benchmarks took almost the same amount
> of CPU (using their new, faster cc) as the Pyramid, they took 3 times as
> long in real time.  Our Pyramid has eagles, the Celerity had the slower
> 120Mb disks.  I don't know what improvement an eagle would make.

I meant to say that the compiles of the trivial benchmarks took almost the
same user CPU, the benchmarks themselves are CPU bound and do no I/O.
The system CPU on the Celerity was twice a Pyramid 90x (i.e.  6.5 vs 2.9)
which I suspect was stack register copy times.

The elapsed real times were more like 2+ than 3 times.  (I just found my notes).