hedrick@athos.rutgers.edu (Charles Hedrick) (01/02/88)
This note is going to be a brief description of our uses of a Sun 3/280 and two Sun 4/280's. I'm doing this in the hopes that people using other kinds of system will post similar descriptions. Benchmarks are interesting, in their way, but I also like to have more qualitative reports of actual user experience. We are a large Sun shop. However most of our machines are 2/50's and 3/50's, being used in ways that will not surprise anyone. I am going to describe a cluster of 3 machines that are being used by computer science researchers. They are aramis.rutgers.edu: 3/280, 8MB, one supereagle on Ciprico controller athos.rutgers.edu: 4/280, 32MB, two supereagles on one Xylogics controller, 6250 bpi Fujitsu tape drive porthos.rutgers.edu: 4/280, 32MB, two supereagles on one Xylogics controller, 6250 bpi Fujitsu tape drive [Aramis, Athos, and Porthos were the names of the Three Musketeers.] Aramis has been around for about a year, the Sun 4's for a couple of months. Together, these machines are the replacement for a DEC-20 (red.rutgers.edu). Originally the DEC-20 was the sole computing resource for the Rutgers computer science dept., although in recent years it has been supplemented by Xerox Interlisp machines and some VAX equipment. Aramis was frankly an experiment to see whether Suns could be used for timesharing. Only Boston University will admit publicly to doing this, and there is considerable scepticism in the community about its practicality. We consider the experiment to be a success. We routinely run about 25 users on aramis. Note however that many of these are in faculty offices, and may leave their terminals logged in all day, so they may not be doing much. Typically one or two would be running Lisp or Prolog. Loads are normally around 1. This machine also acts as a file server (but not ND). All mail for computer science dept. machines is handled by this machine (with its /usr/spool being mounted remotely by other machines). A number of faculty who use other Suns have their home directories on this machine. We started to see some signs of being I/O-bound, which is the reason we went to the Ciprico controller. That seems to have alleviated those problems. However normally we run with I/O of about 25 transfers/sec (as shown by the rf0 column in vmstat), an I/O rate which a Xylogics controller should be able to handle. However that's getting very near the upper bound (which we consider to be 37 transfers/sec), and things tend to work better with peak capacity several times your average. Obviously 8MB is not enough memory for this configuration, though we don't have any trouble until people start firing up big Lisp or Prolog jobs. (They are now being moved to the Sun 4's.) Our big applications on aramis have been Lisp (Common Lisp from Franz), Prolog (Quintus), Fortran, Mathlab, Scribe, and TeX. The Sun 4's are not yet heavily used. The systems staff and operators are using it heavily. One of our big prolog users is now using it (Quintus prolog -- Quintus has the distinction of being the first vendor to come through with software on our list of crucial applications). Our biggest Fortran user is also, and I think that some other numerical work has moved. (I'm sorry to sound vague on this, but it is hard for me to be sure which machine users are running on. I just removed prolog, lisp, and f77 from aramis to make sure that I knew people were moving to the Sun 4's.) The big numerical project involves network optimization problems. The program uses only integer arithmetic, and includes list processing. The Sun 4 is a very good machine for this. The speed is consistent with my rating of the Sun 4, which is 8 VAX 780's. (That is, it is about 4 times a Sun 3/180, or 80% of a Cyber 205 -- remember that this is non-vectorizable code.) I haven't heard any timings of actual floating point-intensive calculations yet, though we did try some benchmarks created by people in our Engineering school. Unfortunately, I don't know quite what comparison from those tests would be meaningful to these readers. One test came out at half the speed of a Convex C1. However of their 3 tests, only 1 functioned correctly. There are still problems with the floating point hardware. As far as I can tell, they involve problems in handling floating point exceptions. In the default system configuration, they cause the program to coredump with a spurious interrupt 11. Sun gives a patch which causes them to run to completion, but the answers to 2 of the 3 programs don't look plausible. We understand that Sun knows what is wrong, and that we will be seeing new CPU boards sometime soon. Of the software that is important to us, we find that most vendors have early versions now for the Sun 4 (which you may or may not actually be able to get copies of). We have been able to get access to enough software to move much of our serious computing to them. But this obviously depends upon the details of what our users are doing. If I were working for Engineering, I would have a much less positive report. My impression is that by the end of the Spring, final releases of everything will be available. I have no complaints about reliability. We had a couple of early failures, but they were in components not specific to the Sun 4 (one memory board and one Ethernet interface). The software has been reasonably reliable. I don't keep good enough records to know how many crashes there have been, but I haven't noticed any (aside from things caused by my own diddling). Our biggest problem has been the fact that no source is available for the software shipped with the Sun 4. Thus we are running with a kernel that contains some modules built from the Sun 3 source, and in some cases I had to patch .o files in adb or Emacs to insert Rutgers changes. (I refuse to work on a system that doesn't support ^T and a few other amenities. Fortunately, the Sun 3 tty.c worked, so ^T was easy. The usual hacked in.c works for installing subnet support, which is not present in the normal release -- it's based on SunOS 3.2.) As for utilities and other programs, generally Sun 3 code can be made to work on the Sun 4. However sometimes portability bugs have to be fixed in order to do this. The Sun 4 has the usual RISC machine constraints: references to full words must be on full-word boundaries, and code that handles variable numbers of args by hacking on the stack won't work. In general code that is already portable (e.g. works on both Sun 3 and Pyramid) will work on the Sun 4. But if it has only been tried on a Sun 3, and was written by a wizard, you may have to remove a few of the more questionable spells. Fortunately, the byte order is the same as the Sun 3. I don't think we have ever had more than a dozen users on one of the Sun 4's, so I can't really report on multi-user performance. We're assuming that if the 3/280 works OK, so will the 4/280. This message is posted from a Sun 4. (If you want to reply by mail, and your mail system doesn't understand MX records, you may want to send mail to me at hedrick@aramis.rutgers.edu or rutgers!hedrick.)