[comp.unix.wizards] experiences with Sun 3/280 and Sun 4

hedrick@athos.rutgers.edu (Charles Hedrick) (01/02/88)
This note is going to be a brief description of our uses of a Sun
3/280 and two Sun 4/280's.  I'm doing this in the hopes that people
using other kinds of system will post similar descriptions.
Benchmarks are interesting, in their way, but I also like to have more
qualitative reports of actual user experience.

We are a large Sun shop.  However most of our machines are 2/50's
and 3/50's, being used in ways that will not surprise anyone.  I am
going to describe a cluster of 3 machines that are being used by
computer science researchers.  They are

  aramis.rutgers.edu: 3/280, 8MB, one supereagle on Ciprico controller
  athos.rutgers.edu: 4/280, 32MB, two supereagles on one Xylogics
	controller, 6250 bpi Fujitsu tape drive
  porthos.rutgers.edu: 4/280, 32MB, two supereagles on one Xylogics
	controller, 6250 bpi Fujitsu tape drive

[Aramis, Athos, and Porthos were the names of the Three Musketeers.]
Aramis has been around for about a year, the Sun 4's for a couple of
months.  Together, these machines are the replacement for a DEC-20
(red.rutgers.edu).  Originally the DEC-20 was the sole computing
resource for the Rutgers computer science dept., although in recent
years it has been supplemented by Xerox Interlisp machines and some
VAX equipment.  Aramis was frankly an experiment to see whether Suns
could be used for timesharing.  Only Boston University will admit
publicly to doing this, and there is considerable scepticism in the
community about its practicality.  We consider the experiment to be a
success.  We routinely run about 25 users on aramis.  Note however
that many of these are in faculty offices, and may leave their
terminals logged in all day, so they may not be doing much.  Typically
one or two would be running Lisp or Prolog.  Loads are normally around
1.  This machine also acts as a file server (but not ND).  All mail
for computer science dept. machines is handled by this machine (with
its /usr/spool being mounted remotely by other machines).  A number of
faculty who use other Suns have their home directories on this
machine.  We started to see some signs of being I/O-bound, which is
the reason we went to the Ciprico controller.  That seems to have
alleviated those problems.  However normally we run with I/O of about
25 transfers/sec (as shown by the rf0 column in vmstat), an I/O rate
which a Xylogics controller should be able to handle.  However that's
getting very near the upper bound (which we consider to be 37
transfers/sec), and things tend to work better with peak capacity
several times your average.  Obviously 8MB is not enough memory for
this configuration, though we don't have any trouble until people
start firing up big Lisp or Prolog jobs.  (They are now being moved to
the Sun 4's.)  Our big applications on aramis have been Lisp
(Common Lisp from Franz), Prolog (Quintus), Fortran, Mathlab, Scribe,
and TeX.

The Sun 4's are not yet heavily used.  The systems staff and operators
are using it heavily.  One of our big prolog users is now using it
(Quintus prolog -- Quintus has the distinction of being the first
vendor to come through with software on our list of crucial
applications).  Our biggest Fortran user is also, and I think that
some other numerical work has moved.  (I'm sorry to sound vague on
this, but it is hard for me to be sure which machine users are running
on.  I just removed prolog, lisp, and f77 from aramis to make sure
that I knew people were moving to the Sun 4's.)  The big numerical
project involves network optimization problems.  The program uses only
integer arithmetic, and includes list processing.  The Sun 4 is a very
good machine for this.  The speed is consistent with my rating of the
Sun 4, which is 8 VAX 780's.  (That is, it is about 4 times a Sun
3/180, or 80% of a Cyber 205 -- remember that this is non-vectorizable
code.)  I haven't heard any timings of actual floating point-intensive
calculations yet, though we did try some benchmarks created by people
in our Engineering school.  Unfortunately, I don't know quite what
comparison from those tests would be meaningful to these readers.  One
test came out at half the speed of a Convex C1.  However of their 3
tests, only 1 functioned correctly.  There are still problems with the
floating point hardware.  As far as I can tell, they involve problems
in handling floating point exceptions.  In the default system
configuration, they cause the program to coredump with a spurious
interrupt 11.  Sun gives a patch which causes them to run to
completion, but the answers to 2 of the 3 programs don't look
plausible.  We understand that Sun knows what is wrong, and that we
will be seeing new CPU boards sometime soon.

Of the software that is important to us, we find that most vendors
have early versions now for the Sun 4 (which you may or may not
actually be able to get copies of).  We have been able to get access
to enough software to move much of our serious computing to them.  But
this obviously depends upon the details of what our users are doing.
If I were working for Engineering, I would have a much less positive
report.  My impression is that by the end of the Spring, final
releases of everything will be available.  I have no complaints about
reliability.  We had a couple of early failures, but they were in
components not specific to the Sun 4 (one memory board and one
Ethernet interface).  The software has been reasonably reliable.  I
don't keep good enough records to know how many crashes there have
been, but I haven't noticed any (aside from things caused by my own
diddling).  Our biggest problem has been the fact that no source is
available for the software shipped with the Sun 4.  Thus we are
running with a kernel that contains some modules built from the Sun 3
source, and in some cases I had to patch .o files in adb or Emacs to
insert Rutgers changes.  (I refuse to work on a system that doesn't
support ^T and a few other amenities.  Fortunately, the Sun 3 tty.c
worked, so ^T was easy.  The usual hacked in.c works for installing
subnet support, which is not present in the normal release -- it's
based on SunOS 3.2.)  As for utilities and other programs, generally
Sun 3 code can be made to work on the Sun 4.  However sometimes
portability bugs have to be fixed in order to do this.  The Sun 4 has
the usual RISC machine constraints: references to full words must be
on full-word boundaries, and code that handles variable numbers of
args by hacking on the stack won't work.  In general code that is
already portable (e.g. works on both Sun 3 and Pyramid) will work on
the Sun 4.  But if it has only been tried on a Sun 3, and was written
by a wizard, you may have to remove a few of the more questionable
spells.  Fortunately, the byte order is the same as the Sun 3.

I don't think we have ever had more than a dozen users on one of the
Sun 4's, so I can't really report on multi-user performance.  We're
assuming that if the 3/280 works OK, so will the 4/280.  

This message is posted from a Sun 4.  (If you want to reply by mail,
and your mail system doesn't understand MX records, you may want to
send mail to me at hedrick@aramis.rutgers.edu or rutgers!hedrick.)