opinions wanted

montanaro@crdgw1.ge.com (Skip Montanaro) (08/14/90)
I received several replies to my query regarding Sun file server options.
Several people who responded indicated that the configuration I proposed
is feasible, though that opinion is not unanimous.

The executive summary is:

o Everybody's situation is different. There are lots of "reasonable" ways to
  accomplish what I set out to do.

o It should work, given that your client SPARCstations have local root and
  swap. It will work better if you 

  a) split the disks among two or more of the 3/260s,
  b) pack lots of RAM in the server(s) for use as disk cache,
  c) upgrade to a 4/300 CPU on the file server (more CPU punch, but perhaps
     more importantly, a faster ethernet interface),
  d) add the Legato PrestoServe product as a secondary upgrade to boost NFS
     file server performance.

o I listed several other options in my original message. There wasn't as
  much consensus about them, but:

  a) Auspex - high priced, probably overkill until you get a large number of
     clients.

  b) Scatter big SCSIs over the SPARCstations - administrative problems,
     especially backups, upgrades, and overall file system structure. There is
     also the conflict between the workstation use and file server use of the
      machines.

I've enclosed my original message as the first message of the digest
below, and tried to trim the responses that follow in a reasonable
fashion. Thanks to all who responded.

Skip (montanaro@crdgw1.ge.com)

------- Start of digest -------

X-From: montanaro@crdgw1.ge.com
X-Subject: Sun disk space expansion/migration options/opinions wanted

Our group of 10-15 people currently gets most of its /home disk space from
two Encore Multimaxes maintained by a central support group. In order to
get us off their machines, they are willing to buy us some storage for our
group's file server.

Their current proposal is to add a Xylogics 7053 and two 2.5 GB disks
(Hitachi?) from NPI to our 3/260. Our client computers consist of nine
4/6[05]GX workstations, each with 16MB of physical memory and 104MB local
disks containing root and swap. We have a few other odds'n'ends, such as a
diskless 3/60, a couple diskless 3/260s, and a 386i (with disks).

My feeling is that, without some reinforcements, the 3/260 file server
will be overburdened with the increased disk load, even though most (and
eventually all) clients will have local root and swap partitions.

My alternatives appear to be:

1. Go with the proposal as it stands and see what happens, making
   adjustments as we go,

2. Purchase an extra 7053 and turn one of the other 3/260's into a second
   file server,

3. Purchase an I/O subsystem accelerator of some sort, such as OMNI
   Solutions' or Legato's products, 

4. Upgrade to a full-fledged file server, such as an Auspex NS5000, or 

5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks) around
   the 4/6x's in our offices, effectively making each share some of the disk
   load.

If I knew for certain that something like an Auspex was in the cards, I'd
opt for SCSI disks compatible with it (HP 660MB now, 1GB later), and move
them when the file server arrived (a combination of #5 this year, followed
by #4 next year).  Due to its expense, however, an Auspex would likely be
shared with a larger organization, with attendant complications in
evaluating, ordering, and maintaining it.

I am in the process of estimating our group's NFS request pattern on the
Multimaxes using Encore's server_stat program. If the write request
percentage is not high enough, then an NFS write accelerator like Legato's
Prestserve probably won't help much, although OMNI's product would
probably still help.

I'm pretty confident that Sun-3s can serve Sun-4s, in principal, if you
can drive the CPU load down by either replicating CPUs or offloading the
CPU with special-purpose I/O subsystems. (After all, the Auspex NS5000 has
a Sun-3 VMEbus-based CPU.)

I would appreciate feedback from people with any suggestions. The Sun-3 to
4/6x route seems pretty common these days, so there must be some useful
experience out there. Here are some questions we can't currently answer
and/or won't be able to investigate thoroughly in the time we have
available:

1. How bad would the added noise and heat be with large external SCSI
   disks hung off 4/6x's?

2. Would something like the OMNI or Legato accelerators allow us to use
   SCSI disks instead of the more expensive (and less flexible) SMD disks?

3. Sun doesn't currently maintain the proposed configuration. What
   alternatives are there for short turnaround (< 24hr) maintenance?

4. What other architectures are we neglecting? SPARCserver-1s with several
   large SCSI disks come to mind. What kind of experience have people had
   with them?

I will summarize the responses to sun-managers and sun-spots.

Thanks,

Skip (montanaro@crdgw1.ge.com)

***

X-From: auspex!guy@uunet.uu.net (Guy Harris)

> If I knew for certain that something like an Auspex was in the cards, I'd
> opt for SCSI disks compatible with it (HP 660MB now, 1GB later),

Note that, at least at present, "compatible" means "we (Auspex) stick the
drive in our drive carrier and plug it into our drive box" - you can't
just plug J. Random SCSI Disk into an NS5000 without some work.

***

X-From: scs@lokkur.dexter.mi.us (Steve Simmons)

In a previous life I did extensive performance analysis of Sun 3/X60 file
servers, diskless clients, and dataless nodes.  The one absolutely true
fact deerived was

  >>>>>>>>>> EVERYONE'S SITUATION IS DIFFERENT <<<<<<<<<<<<<<

That said, I will go ahead and offer some opinions -- just take them with
a big grain of salt.

The best performance/system integrity compromise is dataless nodes and
file servers.  Our analysis (done back in the ND days) was that 80% of
client disk accesses were to /, swap, and /usr (we did not keep actual
user files in /usr - it was the /usr/lib and /usr/bin stuff).  Of those
accesses, / was 99% reads, /usr was 99% reads, and swap was 20 to 40%
writes.  As you read on, remember that writes are expensive in NFS.

Most access to actual user files (what would now be /home) were read
accesses.

Experiments with local swap disks had interesting results.  Putting a
local swap on a single client made no net performance improvement on
overall server performance, *and degraded client perforance in the
one-client model*.  Why?  Because swapping over the wire to an otherwise
unloaded SMD disk was faster than Suns low-perfomance local SCSI of the
time.  But if you look at *all* the clients, it was a different story.
Between 3 and 4 clients came the breakeven point in performance -- 4
clients with local swap disk got better performance that 4 clients
swapping on the SMD disk *if they were all actually swapping*.

SCSI disks have gotten lots faster since then, SMD disks have gotten
somewhat faster, IPI disks are a step up from SMD.  My feeling is the
breakeven is still somewhere between 3 and 6.

In the many-client model, still better performance was obtained by putting
/ on the local disk, and better yet by putting /usr.  This effectively
offloaded 80% of disk accesses and the great bulk of writes from the
server.

There are some tradeoffs, tho.  The more disks you have, the more you have
to administer/repair/upgradeOS/etc.  It's not huge lot of work if you plan
for it: Carry a prebuilt spare disk, which serves double duty.  On one
hand it's a hot spare so you're only down a few minutes when a disk dies,
and when upgrade times you upgrade the spare, swap it into an existing
unit, upgrade the disk you just took out, repeat until all systems are
upgraded.

So what's my suggestion to you?  On the assumption your clients do get
reasonably hard use, I'd do one of the two following:

Scenario A:

Put / and swap on the local 4/6X disks.  /usr would be nice, but won't
fit.

Divide the 2.5GB disks between the 3/260s.  This gives you redundant
service and faster 3/260 performance, at only the cost of one extra copy
of /usr.  If one server goes down, restore the critical users files from
tape and carry on.

Buying another disk for the 3/60 is probably not cost/performance
effective.  Instead put all the RAM you can on it so it will never swap.
With local disks on most stations and the majority of write hitting those
local swap areas, you will get plenty of performance from the two servers.

Scenario B:

Put both disks on a single 3/260 with a legato board.  This will probably
give you equal performance, but you now have a single point of failure.
If the server goes down, the whole shop goes down.  How important that is
depends on the quality of your service.  Remember, getting a drive fixed
in 24 hours means two days down time -- you still have to format and
restore and maybe reinstall.

Scenarios that don't sound real reasonable:

>	4. Upgrade to a full-fledged file server, such as an Auspex NS5000,
>	   or 

Probably not cost-effective, and not significantly better performance than
A or B.

>	5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks)
>	   around the 4/6x's in our offices, effectively making each share
>	   some of the disk load.

Definately no.  If you scatter users files around you now start running
into potential administrative nightmares.  Backups are harder, longer,
more complex; ditto restores.  If a project runs out of disk you then have
to do wierd cross mounts, symbolic links, and just generally wind up with
a baroque file system topology.

***

X-From: Jeff Nieusma <nieusma@boulder.colorado.edu>

>> 3. Purchase an I/O subsystem accelerator of some sort, such as OMNI
>>    Solutions' or Legato's products, 

This option, along with jamming the server full of RAM is your best and
probably cheapest option... I highly recommend the Legatto PrestoServe
product.  This option will make backups rather painless.  You will only
have to bring one machine down for dumps and you don't have to run all
over the world.

>> 5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks)
>>    around the 4/6x's in our offices, effectively making each share
>>    some of the disk load.

this is an option that basically mandates use of the automounter.  It's a
great way to deal with millions of cross mounts. This option makes backups
MUCH more difficult.

>> 2. Would something like the OMNI or Legato accelerators allow us to
>>    use SCSI disks instead of the more expensive (and less flexible)
>>    SMD disks?

yes.  The Legatto board speeds up the write operations by caching the
writes and then immediately telling the kernal that the write is done.
This way, the PrestoServe board can actually make that write to disk at
it's leasure.  Most reads are caught in the filesystem cache, like about
93% of them, so that takes care of the read part.

Keep in mind that the filesystem will always be your bottleneck.  If you
keep as many of the read/write operations at memory to memory transfer
speeds, you will be doing the best you can do.  You will always have
problems transfering very large files because they will overwrite the
filesystem buffers, but with small files and commonly accessed files, they
will live in the cache and you won't be slowed down by file I/O.

***

X-From: eplrx7!mcneill@uunet.uu.net (Keith McNeill)

We had a 3/260 with 6 diskless clients & about 2.5 gigabytes on it.  It
handled it with good performance.  I think your best bet is to get 2
7053's and put 1 on your 3/260 & put the other on one of your diskless
3/260.  The 7053s are probably cheap compared to the disks so I don't
think it would matter much if you bought 2 7053s instead of 1.

***

X-From: Scott Blandford 596-5316 <bford@pcs.cnc.edu>

When you say 10-15 people, I am assuming that you also mean 10-15
machines.  In which case 1 server with 5 GB of disk space ought to be
adequate to serve most peoples needs.  You do not say what your loads are
but I am going to assume that they are relatively high.  With all of your
local machines having at least 16MB RAM along with local root and swap
space, I think you will have little problem with your disk getting bogged
down.  We have a similar configur- ation without the local disks, and have
zero of the problems you fear.

If you do develop high disk loads, this gives you an easy migration path,
also.  You can purchase another 7053, and put 1 disk on each of your
260's.

***

X-From: jan@eik.ii.uib.no

Think you already see the first 'problem': There's just so many ways this
could be *done* :-)

We've run with this configuration for a long time (2+ years). We had no
problems. It was basically a fileserver, serving

	- /usr, /usr/local		-> ie. no diskless clients
	- 3 user file-systems		-> staff, grad. students
	- mail- and news-server
	- terminal-users		-> ligth stuff, latex etc.
					(we frowned on compile/run)

One problem may be if your users run programs that produce a lot of output
that gets written to disk, ie. I am not worried about text-editing and
such.

The nice part about SCSI-disks are that they are *much* cheaper, probably
less than $5000 for 1.2GB. SMD-disks are generally more robust, less
error-prone, in my opinion. We have servers both with SMD and with SCSI,
and one use both.

I've got two suggestions:
		- ask for an upgrade to a Sun-4/360 (!). This
		should probably cost less than $8-10k. This has
		three advantages:
			- faster cpu *and* ethernet (lance is much
			faster than Intel, helps your NFS-response)
			- you get SCSI for free
			- memory-upgrades will get a lot
			  cheaper (32MB possible on the cpu)

		- if you will be doing your own backups, get an
		Exabyte too.

We did this around easter, mostly to get an all Sun-4 environment on our
research network. If you can get somebody else to pay for it this is a
*good* idea.

***

X-From: dal@gcm.com (Dan Lorenzini)

I would argue against scattering large SCSI disks among your 4/6[05]s to
distribute the load.  My experience is that it sets up a conflict between
the workstation user and the fileserver clients that makes everyone
unhappy and can only ultimately be resolved by having a dedicated
fileserver.  So it's better to start off that way.

I have found that a 3/[12]60 works well as a dedicated fileserver.  Just
use a regular CRT as console, minimal kernel, put it off in a corner and
it will chug away for a good long time.  We use our old 3/110s for this
purpose.  Works fine with SCSIs.  You don't even need more than 8 Meg.
Also, you can replace the 3/ board with a 4/ and get much better
throughput for not much money.  Since you have Sparcstations as clients,
this is probably worth the investment.

In short, unless you're really into shiny and new, you can squeeze some
more life out of your aging sun3s by turning them into filerservers for
minimal investment.  This is not the fastest, but probably the cheapest
adequate solution.

On a not too unrelated note, we have a used 4/280, 16 MB memory, 2 Hitachi
892 MB drives on a 7053 controller for sale.  Can be had with or without
Ciprico RF3500 SCSI controller.  Always under Sun maintenance.  Price
about $25000.  If you're interested, call or email me or Frank Duquette,
(fld@gcm.com, 203-625-2741).

Dan Lorenzini
203-625-2779

***

X-From: trinkle@cs.purdue.edu

We currently have 7 Sun3/2[68]0's that serve diskless Sun3 clients
(average 8).  They also provide general NFS file service to all machines
in our department, including an increasing number of Sun4s.  If you keep a
lot of the "temporary" traffic (swap, /tmp) from the Sun4s on the local
disk, I think your Sun3 servers may keep up.  If you are going to buy a
new SMD controller, I would recommend buying a Rimfire 3223 from Ciprico
rather than a 7053 from Sun.  It does involve installing a driver from
Ciprico and using their disk formating utility, but we have had very good
success with them and are very happy with the performance.  For multiple
drives, it seems to do a better job of cacheing than the 7053 (Ciprico
will provide you with a performance evaluation paper if you want).

We are slowly replacing our multi-user fileservers (Sequent, VAXen, etc)
with Sparcstation servers of our own "construction".  We buy a SS1+ (not
Sun's SparcServer package), a couple 660MB HP SCSI drives in a cabinet
from Cranell, install a few Sbus ethernet cards, and have an NFS
fileserver attached to multiple subnets.  We currently have one such
server with 3 drives on it, and it is also the IP gateway for two subnets.
We have not seen any performance problems yet.  We are also going to try
using the SLC's as servers.  Clearly, one of our major contraints is cost.

The HP drives have a 5 year warranty.  We just buy a spare drive to
provide our own "immediate response" maintenance.  As for other hardware
(CPU), we also have a couple spares that we can swap in immediately if we
need to.  We are too cheap to pay for outside maintenance.  The only case
in which we have been burned by the self maintenance was with our Swallow
disk drives - we have had a lot of failures.

***

X-From: shaffer@athena.crd.ge.com (Phillip L. Shaffer)

We have a 3/280 server with 3 - 892 MB drives, serving 14 Sun 4s and 8 Sun
3s.  The Sun 4s have 104 MB local disks for swap and user files (root on
server), and 16-24 MB RAM.  The Sun 3s are diskless (except 1) with 8-24
MB RAM.  Most of our users I would call light or intermittent users.  We
get /common from the encores.  I have never seen any evidence that the
server is a bottleneck for anyone here.  With local user and swap space on
the 4/60's, they really don't put a heavy load on the server.  I think you
could put the disks on 1 3/260 without a problem, but you might want to
put users most-used files locally on 4/60's.

***

X-From: abair@turbinia.sps.mot.com (Alan Bair)

Without making any large changes in your current mix of machines, I would
suggest the following.

1. Take the 2 2.5GB disks and split them between the 2 3/260s, thus using
   them as servers.  This spreads the workload, cutting down on the need for
   an NFS accelerator.  Also keeps you partially going if you loose one of
   the machines.

2. To spread the load a little more, you could add a >= 200MB internal drive
   to each of the dataless clients for user space, besides what is on the
   server.  Cost may require you to go with smaller server drives.  Ooops,
   I'm not sure if you have room for 2 internal drives on 4/6[05] machines.
   We don't have Sparcs around here :)

3. The next big step would be to replace the 3/260 with the Sparcserver as
   you suggested.  This is more the type of configuration Sun would like to
   see.

The external disk add a fair amount of noise.  We only have the 141
shoeboxes which are quite noisy.  Maybe the larger ones are quieter now.

***

X-From: murphy!peterg@uunet.uu.net (Peter Gutmann)

Of the available options I would beleive that the best would be #2. By
distributing the file service (and diskless service) over the largest
number of machines and disks the better the overall performance will be.

We have four diskless workstations (3-3/60s and a 3/260) booting off a
single 3/260.  However, three of the workstations are used in development
and the third is used as a replacement for several terminals.

I won't say we don't have NFS problems, however the network is lightly
loaded (mostly used for connecting frontend applications to the Sybase
dataserver) and can stand the swaps and pages accross the net. With a
heavly loaded network this would be another story.

I am under the impression that for 10-15 workstations an Auspex server
would be overkill. I would start by adding a second 7053 to the second
machine then add the NFS accelerators if performance became a problem
later on.

I also beleive that the Sun 3s can be the server for Sun 4s.  The byte
order is not a problen as a result of NFS's use of the RPC and XPR to
communicate over the net.

We have several Sun 3 cpus under contract with Motorola. For the once or
twice we have used them they have been just as good or better than Sun. As
far as price goes, they charge us $698.00 per mo for same day service on a
3/286 w/451 controller and a 892mb disk, 1600/6250bpi tape, and 32mb or
ram.

***

X-From: eggert@twinsun.com (Paul Eggert)

We were in a similar boat last year.  We have a Sun-3/260 file server and
a dozen or so Sun workstations together with assorted non-Sun
workstations.  We decided to scatter large external SCSI disks among our
Sparcstations.  We have 3 now and will get another soon.  This was by far
the cheapest route for the space, because you can get a reliable 660MB
shoebox for ~$2000 these days.  Performance is good enough for us,
although I'm sure it depends on your application; ours is software
development.  Not much heat is added.  There is some noise, but most folks
put the disk behind their desk and it's not bad.  We don't have Sun
hardware maintenance, and don't feel the need: our drives have a much
longer warranty than Sun provides.

***

X-From: sjsca4!poffen@uunet.uu.net (Russ Poffenberger)

We recently bought an Auspex NS5000 and like it very much. The cutoff
point for it being cost-effective is about 40 clients. We have 50+ here in
my work group alone.

The advantages for us.

1.) Faster file service. During peak load hours, our old system (two 3/200
class machines) was so overloaded, the response time went out by a factor
of ten. With the NS5000, the response time is not noticeably affected at
any time.

2.) Reliability. We would have a server crash about once a week. We
routinely get 60 - 70 days with thr NS5000. The last time was a power
outage at our site, the system cam right back up though, (while other work
groups servers based on Suns had difficulty.)

3.) Maintenence. Our two 3/200 class machines were costing us about $800
each per month for Sun service. We get the same quality (better actually)
service from Auspex on the NS5000 for $975 per month. Almost half what we
were paying before. Of course, it is hard to tell how good the service is,
we haven't had any failures.

4.) Response. Auspex has been very responsive. They are working with us to
make sure we are happy (Sun NEVER does this), and are constantly working
to add new features. (A new feature, due out soon, is the ability to turn
synchronous NFS writes off an a file system basis. This is good for
non-critical filesystems like swap, which should increase performance.

What you decide is up to you, and your budget. These are just my opinions.
I have no affiliation with Auspex except as a satisfied customer.

***

X-From: brian@cimage.com (Brian Kelley)

You might also condsider buying a couple of those very inexpensive SLC's
and hanging SCSI disks off of them.

***

X-From: botticel@orion.crd.ge.com (David J. Botticello)

I can only speak from the limited experience we had here, but a sun 3 is
just not fast enough to handle the data transaction speed of the sparcs.
We noticed a substantial improvment when we replaced the 3/260 with a
solbourne. Now we did (and still do ) have 4 of the dozen or so clients
swapping off of it, but I can't believe it is all of the trouble.

I also manage a smaller cluster than you propose (3/260 zeppo, 3/160,3/60,
two 4/60) on which we will be getting delivery of a 3/200 4/300 series
upgrade soon. All the clients have local root/swap/tmp, and when we run
our CAD software(pc board routers) we still get occasional nfs server
timeouts.  The routers are running on the sparcs with the local swap at
100m and the program's binary stored on a local disk, so the only traffic
to speak of is the data flow.

I suggest the same upgrade for the 3/260, it is roughly 14K depends on the
amount of memory.

***

X-From: sid@think.com

I would consider buying a 4/65 with 16 meg of memory and getting 4 of the
1.2 gig HP's. Our cost for such system is around $30K. We have had very
good experiences with 4/65's as file servers for 4/60-65's. We have 8. The
performance is very good and with a 4/65 you can add a second SCSI
controller and 4 more disks for a total of 8 gig of storage.

------- End of digest -------