montanaro@crdgw1.ge.com (Skip Montanaro) (08/14/90)
I received several replies to my query regarding Sun file server options. Several people who responded indicated that the configuration I proposed is feasible, though that opinion is not unanimous. The executive summary is: o Everybody's situation is different. There are lots of "reasonable" ways to accomplish what I set out to do. o It should work, given that your client SPARCstations have local root and swap. It will work better if you a) split the disks among two or more of the 3/260s, b) pack lots of RAM in the server(s) for use as disk cache, c) upgrade to a 4/300 CPU on the file server (more CPU punch, but perhaps more importantly, a faster ethernet interface), d) add the Legato PrestoServe product as a secondary upgrade to boost NFS file server performance. o I listed several other options in my original message. There wasn't as much consensus about them, but: a) Auspex - high priced, probably overkill until you get a large number of clients. b) Scatter big SCSIs over the SPARCstations - administrative problems, especially backups, upgrades, and overall file system structure. There is also the conflict between the workstation use and file server use of the machines. I've enclosed my original message as the first message of the digest below, and tried to trim the responses that follow in a reasonable fashion. Thanks to all who responded. Skip (montanaro@crdgw1.ge.com) ------- Start of digest ------- X-From: montanaro@crdgw1.ge.com X-Subject: Sun disk space expansion/migration options/opinions wanted Our group of 10-15 people currently gets most of its /home disk space from two Encore Multimaxes maintained by a central support group. In order to get us off their machines, they are willing to buy us some storage for our group's file server. Their current proposal is to add a Xylogics 7053 and two 2.5 GB disks (Hitachi?) from NPI to our 3/260. Our client computers consist of nine 4/6[05]GX workstations, each with 16MB of physical memory and 104MB local disks containing root and swap. We have a few other odds'n'ends, such as a diskless 3/60, a couple diskless 3/260s, and a 386i (with disks). My feeling is that, without some reinforcements, the 3/260 file server will be overburdened with the increased disk load, even though most (and eventually all) clients will have local root and swap partitions. My alternatives appear to be: 1. Go with the proposal as it stands and see what happens, making adjustments as we go, 2. Purchase an extra 7053 and turn one of the other 3/260's into a second file server, 3. Purchase an I/O subsystem accelerator of some sort, such as OMNI Solutions' or Legato's products, 4. Upgrade to a full-fledged file server, such as an Auspex NS5000, or 5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks) around the 4/6x's in our offices, effectively making each share some of the disk load. If I knew for certain that something like an Auspex was in the cards, I'd opt for SCSI disks compatible with it (HP 660MB now, 1GB later), and move them when the file server arrived (a combination of #5 this year, followed by #4 next year). Due to its expense, however, an Auspex would likely be shared with a larger organization, with attendant complications in evaluating, ordering, and maintaining it. I am in the process of estimating our group's NFS request pattern on the Multimaxes using Encore's server_stat program. If the write request percentage is not high enough, then an NFS write accelerator like Legato's Prestserve probably won't help much, although OMNI's product would probably still help. I'm pretty confident that Sun-3s can serve Sun-4s, in principal, if you can drive the CPU load down by either replicating CPUs or offloading the CPU with special-purpose I/O subsystems. (After all, the Auspex NS5000 has a Sun-3 VMEbus-based CPU.) I would appreciate feedback from people with any suggestions. The Sun-3 to 4/6x route seems pretty common these days, so there must be some useful experience out there. Here are some questions we can't currently answer and/or won't be able to investigate thoroughly in the time we have available: 1. How bad would the added noise and heat be with large external SCSI disks hung off 4/6x's? 2. Would something like the OMNI or Legato accelerators allow us to use SCSI disks instead of the more expensive (and less flexible) SMD disks? 3. Sun doesn't currently maintain the proposed configuration. What alternatives are there for short turnaround (< 24hr) maintenance? 4. What other architectures are we neglecting? SPARCserver-1s with several large SCSI disks come to mind. What kind of experience have people had with them? I will summarize the responses to sun-managers and sun-spots. Thanks, Skip (montanaro@crdgw1.ge.com) *** X-From: auspex!guy@uunet.uu.net (Guy Harris) > If I knew for certain that something like an Auspex was in the cards, I'd > opt for SCSI disks compatible with it (HP 660MB now, 1GB later), Note that, at least at present, "compatible" means "we (Auspex) stick the drive in our drive carrier and plug it into our drive box" - you can't just plug J. Random SCSI Disk into an NS5000 without some work. *** X-From: scs@lokkur.dexter.mi.us (Steve Simmons) In a previous life I did extensive performance analysis of Sun 3/X60 file servers, diskless clients, and dataless nodes. The one absolutely true fact deerived was >>>>>>>>>> EVERYONE'S SITUATION IS DIFFERENT <<<<<<<<<<<<<< That said, I will go ahead and offer some opinions -- just take them with a big grain of salt. The best performance/system integrity compromise is dataless nodes and file servers. Our analysis (done back in the ND days) was that 80% of client disk accesses were to /, swap, and /usr (we did not keep actual user files in /usr - it was the /usr/lib and /usr/bin stuff). Of those accesses, / was 99% reads, /usr was 99% reads, and swap was 20 to 40% writes. As you read on, remember that writes are expensive in NFS. Most access to actual user files (what would now be /home) were read accesses. Experiments with local swap disks had interesting results. Putting a local swap on a single client made no net performance improvement on overall server performance, *and degraded client perforance in the one-client model*. Why? Because swapping over the wire to an otherwise unloaded SMD disk was faster than Suns low-perfomance local SCSI of the time. But if you look at *all* the clients, it was a different story. Between 3 and 4 clients came the breakeven point in performance -- 4 clients with local swap disk got better performance that 4 clients swapping on the SMD disk *if they were all actually swapping*. SCSI disks have gotten lots faster since then, SMD disks have gotten somewhat faster, IPI disks are a step up from SMD. My feeling is the breakeven is still somewhere between 3 and 6. In the many-client model, still better performance was obtained by putting / on the local disk, and better yet by putting /usr. This effectively offloaded 80% of disk accesses and the great bulk of writes from the server. There are some tradeoffs, tho. The more disks you have, the more you have to administer/repair/upgradeOS/etc. It's not huge lot of work if you plan for it: Carry a prebuilt spare disk, which serves double duty. On one hand it's a hot spare so you're only down a few minutes when a disk dies, and when upgrade times you upgrade the spare, swap it into an existing unit, upgrade the disk you just took out, repeat until all systems are upgraded. So what's my suggestion to you? On the assumption your clients do get reasonably hard use, I'd do one of the two following: Scenario A: Put / and swap on the local 4/6X disks. /usr would be nice, but won't fit. Divide the 2.5GB disks between the 3/260s. This gives you redundant service and faster 3/260 performance, at only the cost of one extra copy of /usr. If one server goes down, restore the critical users files from tape and carry on. Buying another disk for the 3/60 is probably not cost/performance effective. Instead put all the RAM you can on it so it will never swap. With local disks on most stations and the majority of write hitting those local swap areas, you will get plenty of performance from the two servers. Scenario B: Put both disks on a single 3/260 with a legato board. This will probably give you equal performance, but you now have a single point of failure. If the server goes down, the whole shop goes down. How important that is depends on the quality of your service. Remember, getting a drive fixed in 24 hours means two days down time -- you still have to format and restore and maybe reinstall. Scenarios that don't sound real reasonable: > 4. Upgrade to a full-fledged file server, such as an Auspex NS5000, > or Probably not cost-effective, and not significantly better performance than A or B. > 5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks) > around the 4/6x's in our offices, effectively making each share > some of the disk load. Definately no. If you scatter users files around you now start running into potential administrative nightmares. Backups are harder, longer, more complex; ditto restores. If a project runs out of disk you then have to do wierd cross mounts, symbolic links, and just generally wind up with a baroque file system topology. *** X-From: Jeff Nieusma <nieusma@boulder.colorado.edu> >> 3. Purchase an I/O subsystem accelerator of some sort, such as OMNI >> Solutions' or Legato's products, This option, along with jamming the server full of RAM is your best and probably cheapest option... I highly recommend the Legatto PrestoServe product. This option will make backups rather painless. You will only have to bring one machine down for dumps and you don't have to run all over the world. >> 5. Scatter large external SCSI disks (like HP's 660MB or 1GB disks) >> around the 4/6x's in our offices, effectively making each share >> some of the disk load. this is an option that basically mandates use of the automounter. It's a great way to deal with millions of cross mounts. This option makes backups MUCH more difficult. >> 2. Would something like the OMNI or Legato accelerators allow us to >> use SCSI disks instead of the more expensive (and less flexible) >> SMD disks? yes. The Legatto board speeds up the write operations by caching the writes and then immediately telling the kernal that the write is done. This way, the PrestoServe board can actually make that write to disk at it's leasure. Most reads are caught in the filesystem cache, like about 93% of them, so that takes care of the read part. Keep in mind that the filesystem will always be your bottleneck. If you keep as many of the read/write operations at memory to memory transfer speeds, you will be doing the best you can do. You will always have problems transfering very large files because they will overwrite the filesystem buffers, but with small files and commonly accessed files, they will live in the cache and you won't be slowed down by file I/O. *** X-From: eplrx7!mcneill@uunet.uu.net (Keith McNeill) We had a 3/260 with 6 diskless clients & about 2.5 gigabytes on it. It handled it with good performance. I think your best bet is to get 2 7053's and put 1 on your 3/260 & put the other on one of your diskless 3/260. The 7053s are probably cheap compared to the disks so I don't think it would matter much if you bought 2 7053s instead of 1. *** X-From: Scott Blandford 596-5316 <bford@pcs.cnc.edu> When you say 10-15 people, I am assuming that you also mean 10-15 machines. In which case 1 server with 5 GB of disk space ought to be adequate to serve most peoples needs. You do not say what your loads are but I am going to assume that they are relatively high. With all of your local machines having at least 16MB RAM along with local root and swap space, I think you will have little problem with your disk getting bogged down. We have a similar configur- ation without the local disks, and have zero of the problems you fear. If you do develop high disk loads, this gives you an easy migration path, also. You can purchase another 7053, and put 1 disk on each of your 260's. *** X-From: jan@eik.ii.uib.no Think you already see the first 'problem': There's just so many ways this could be *done* :-) We've run with this configuration for a long time (2+ years). We had no problems. It was basically a fileserver, serving - /usr, /usr/local -> ie. no diskless clients - 3 user file-systems -> staff, grad. students - mail- and news-server - terminal-users -> ligth stuff, latex etc. (we frowned on compile/run) One problem may be if your users run programs that produce a lot of output that gets written to disk, ie. I am not worried about text-editing and such. The nice part about SCSI-disks are that they are *much* cheaper, probably less than $5000 for 1.2GB. SMD-disks are generally more robust, less error-prone, in my opinion. We have servers both with SMD and with SCSI, and one use both. I've got two suggestions: - ask for an upgrade to a Sun-4/360 (!). This should probably cost less than $8-10k. This has three advantages: - faster cpu *and* ethernet (lance is much faster than Intel, helps your NFS-response) - you get SCSI for free - memory-upgrades will get a lot cheaper (32MB possible on the cpu) - if you will be doing your own backups, get an Exabyte too. We did this around easter, mostly to get an all Sun-4 environment on our research network. If you can get somebody else to pay for it this is a *good* idea. *** X-From: dal@gcm.com (Dan Lorenzini) I would argue against scattering large SCSI disks among your 4/6[05]s to distribute the load. My experience is that it sets up a conflict between the workstation user and the fileserver clients that makes everyone unhappy and can only ultimately be resolved by having a dedicated fileserver. So it's better to start off that way. I have found that a 3/[12]60 works well as a dedicated fileserver. Just use a regular CRT as console, minimal kernel, put it off in a corner and it will chug away for a good long time. We use our old 3/110s for this purpose. Works fine with SCSIs. You don't even need more than 8 Meg. Also, you can replace the 3/ board with a 4/ and get much better throughput for not much money. Since you have Sparcstations as clients, this is probably worth the investment. In short, unless you're really into shiny and new, you can squeeze some more life out of your aging sun3s by turning them into filerservers for minimal investment. This is not the fastest, but probably the cheapest adequate solution. On a not too unrelated note, we have a used 4/280, 16 MB memory, 2 Hitachi 892 MB drives on a 7053 controller for sale. Can be had with or without Ciprico RF3500 SCSI controller. Always under Sun maintenance. Price about $25000. If you're interested, call or email me or Frank Duquette, (fld@gcm.com, 203-625-2741). Dan Lorenzini 203-625-2779 *** X-From: trinkle@cs.purdue.edu We currently have 7 Sun3/2[68]0's that serve diskless Sun3 clients (average 8). They also provide general NFS file service to all machines in our department, including an increasing number of Sun4s. If you keep a lot of the "temporary" traffic (swap, /tmp) from the Sun4s on the local disk, I think your Sun3 servers may keep up. If you are going to buy a new SMD controller, I would recommend buying a Rimfire 3223 from Ciprico rather than a 7053 from Sun. It does involve installing a driver from Ciprico and using their disk formating utility, but we have had very good success with them and are very happy with the performance. For multiple drives, it seems to do a better job of cacheing than the 7053 (Ciprico will provide you with a performance evaluation paper if you want). We are slowly replacing our multi-user fileservers (Sequent, VAXen, etc) with Sparcstation servers of our own "construction". We buy a SS1+ (not Sun's SparcServer package), a couple 660MB HP SCSI drives in a cabinet from Cranell, install a few Sbus ethernet cards, and have an NFS fileserver attached to multiple subnets. We currently have one such server with 3 drives on it, and it is also the IP gateway for two subnets. We have not seen any performance problems yet. We are also going to try using the SLC's as servers. Clearly, one of our major contraints is cost. The HP drives have a 5 year warranty. We just buy a spare drive to provide our own "immediate response" maintenance. As for other hardware (CPU), we also have a couple spares that we can swap in immediately if we need to. We are too cheap to pay for outside maintenance. The only case in which we have been burned by the self maintenance was with our Swallow disk drives - we have had a lot of failures. *** X-From: shaffer@athena.crd.ge.com (Phillip L. Shaffer) We have a 3/280 server with 3 - 892 MB drives, serving 14 Sun 4s and 8 Sun 3s. The Sun 4s have 104 MB local disks for swap and user files (root on server), and 16-24 MB RAM. The Sun 3s are diskless (except 1) with 8-24 MB RAM. Most of our users I would call light or intermittent users. We get /common from the encores. I have never seen any evidence that the server is a bottleneck for anyone here. With local user and swap space on the 4/60's, they really don't put a heavy load on the server. I think you could put the disks on 1 3/260 without a problem, but you might want to put users most-used files locally on 4/60's. *** X-From: abair@turbinia.sps.mot.com (Alan Bair) Without making any large changes in your current mix of machines, I would suggest the following. 1. Take the 2 2.5GB disks and split them between the 2 3/260s, thus using them as servers. This spreads the workload, cutting down on the need for an NFS accelerator. Also keeps you partially going if you loose one of the machines. 2. To spread the load a little more, you could add a >= 200MB internal drive to each of the dataless clients for user space, besides what is on the server. Cost may require you to go with smaller server drives. Ooops, I'm not sure if you have room for 2 internal drives on 4/6[05] machines. We don't have Sparcs around here :) 3. The next big step would be to replace the 3/260 with the Sparcserver as you suggested. This is more the type of configuration Sun would like to see. The external disk add a fair amount of noise. We only have the 141 shoeboxes which are quite noisy. Maybe the larger ones are quieter now. *** X-From: murphy!peterg@uunet.uu.net (Peter Gutmann) Of the available options I would beleive that the best would be #2. By distributing the file service (and diskless service) over the largest number of machines and disks the better the overall performance will be. We have four diskless workstations (3-3/60s and a 3/260) booting off a single 3/260. However, three of the workstations are used in development and the third is used as a replacement for several terminals. I won't say we don't have NFS problems, however the network is lightly loaded (mostly used for connecting frontend applications to the Sybase dataserver) and can stand the swaps and pages accross the net. With a heavly loaded network this would be another story. I am under the impression that for 10-15 workstations an Auspex server would be overkill. I would start by adding a second 7053 to the second machine then add the NFS accelerators if performance became a problem later on. I also beleive that the Sun 3s can be the server for Sun 4s. The byte order is not a problen as a result of NFS's use of the RPC and XPR to communicate over the net. We have several Sun 3 cpus under contract with Motorola. For the once or twice we have used them they have been just as good or better than Sun. As far as price goes, they charge us $698.00 per mo for same day service on a 3/286 w/451 controller and a 892mb disk, 1600/6250bpi tape, and 32mb or ram. *** X-From: eggert@twinsun.com (Paul Eggert) We were in a similar boat last year. We have a Sun-3/260 file server and a dozen or so Sun workstations together with assorted non-Sun workstations. We decided to scatter large external SCSI disks among our Sparcstations. We have 3 now and will get another soon. This was by far the cheapest route for the space, because you can get a reliable 660MB shoebox for ~$2000 these days. Performance is good enough for us, although I'm sure it depends on your application; ours is software development. Not much heat is added. There is some noise, but most folks put the disk behind their desk and it's not bad. We don't have Sun hardware maintenance, and don't feel the need: our drives have a much longer warranty than Sun provides. *** X-From: sjsca4!poffen@uunet.uu.net (Russ Poffenberger) We recently bought an Auspex NS5000 and like it very much. The cutoff point for it being cost-effective is about 40 clients. We have 50+ here in my work group alone. The advantages for us. 1.) Faster file service. During peak load hours, our old system (two 3/200 class machines) was so overloaded, the response time went out by a factor of ten. With the NS5000, the response time is not noticeably affected at any time. 2.) Reliability. We would have a server crash about once a week. We routinely get 60 - 70 days with thr NS5000. The last time was a power outage at our site, the system cam right back up though, (while other work groups servers based on Suns had difficulty.) 3.) Maintenence. Our two 3/200 class machines were costing us about $800 each per month for Sun service. We get the same quality (better actually) service from Auspex on the NS5000 for $975 per month. Almost half what we were paying before. Of course, it is hard to tell how good the service is, we haven't had any failures. 4.) Response. Auspex has been very responsive. They are working with us to make sure we are happy (Sun NEVER does this), and are constantly working to add new features. (A new feature, due out soon, is the ability to turn synchronous NFS writes off an a file system basis. This is good for non-critical filesystems like swap, which should increase performance. What you decide is up to you, and your budget. These are just my opinions. I have no affiliation with Auspex except as a satisfied customer. *** X-From: brian@cimage.com (Brian Kelley) You might also condsider buying a couple of those very inexpensive SLC's and hanging SCSI disks off of them. *** X-From: botticel@orion.crd.ge.com (David J. Botticello) I can only speak from the limited experience we had here, but a sun 3 is just not fast enough to handle the data transaction speed of the sparcs. We noticed a substantial improvment when we replaced the 3/260 with a solbourne. Now we did (and still do ) have 4 of the dozen or so clients swapping off of it, but I can't believe it is all of the trouble. I also manage a smaller cluster than you propose (3/260 zeppo, 3/160,3/60, two 4/60) on which we will be getting delivery of a 3/200 4/300 series upgrade soon. All the clients have local root/swap/tmp, and when we run our CAD software(pc board routers) we still get occasional nfs server timeouts. The routers are running on the sparcs with the local swap at 100m and the program's binary stored on a local disk, so the only traffic to speak of is the data flow. I suggest the same upgrade for the 3/260, it is roughly 14K depends on the amount of memory. *** X-From: sid@think.com I would consider buying a 4/65 with 16 meg of memory and getting 4 of the 1.2 gig HP's. Our cost for such system is around $30K. We have had very good experiences with 4/65's as file servers for 4/60-65's. We have 8. The performance is very good and with a 4/65 you can add a second SCSI controller and 4 more disks for a total of 8 gig of storage. ------- End of digest -------