jch@omnigate.clarkson.EDU.UUCP (04/30/87)
Tead Mean <mead@tut.cc.rochester.edu> writes: > > There seemed to be a consensus that having diskless workstations and >file servers on a network would cause havoc to an Ethernet. I'd like to solicit comments on a configuration that our School of Engineering has proposed: They would like to purchase an Aliant super-mini-computer, a Sun 3/160 server and 12 diskless 3/50s, 8 Opus Clipper systems (a PC/AT with a 32032 processor board running a System V port (I belive)), and 83 IBM PC/AT clones running Sun's PC/NFS. The Opus system are supposed to be disk servers for the PC/NFS systems where most of the computing is supposed to take place. Most of the PC/ATs will not have any hard disk, they will rely fully on the Opus systems for disk storage. All this equipment, in 4 buildings, will be linked with 3 fiber repeaters, making one large ethernet. Our limited experience shows that one or two diskless 3/50s doing disk intensive work (compiling programs or coping disk files around) significantly affect the performance of both other diskless 3/50s and PCs on the same net that do not make use of the file server (i.e. DECnet-DOS to a VMS system). (The Imperial ;-)) We in the computing center would like to see some partitioning of the ethernet into departmental segments connected to a School of Engineering backbone with at least level II bridges. In our minds this would localize traffic to some degree, isolate potential physical problems (shorted or broken cable, accidental or malicious) and provide some measure of security. This would not address problems of the "Chernobyl" effect. Does anyone have experience with a similar configuration of diskless workstations and/or PCs that they can comment on? Thanks Jeff
root@TOPAZ.RUTGERS.EDU.UUCP (05/01/87)
We use diskless Suns extensively. We have around 40 diskless machines on one Ethernet. There is evidence that this causes more load than one would prefer. On the other hand, it is also not a disaster. I'm sceptical of 2 machines causing serious problems. We'd rather keep it to about 25. Because of the critical dependence of Suns on their Ethernets, and the wierd things that some TCP/IP implementations do to the Ethernet, we keep diskless Suns on separate Ethernets dedicated to just Suns. We use a real IP gateway between the Ethernets. Level 2 bridges would certainly help with the load, but would not necessarily provide isolation from wierd packets. Whether this is a problem depends upon how confident you are in your TCP/IP implementations.
mike@BRL.ARPA (Mike Muuss) (05/02/87)
Two Sun-3/50 processors blasting to each other with a TCP connection can achieve ~2-3 Mbits/sec user-to-user throughput (tested with the TTCP program), and seem to use about 25% of the ethernet bandwidth as monitored on another Sun-3/50, which has unknown (to me) measurement accuracy. In our experiences this has had no noticable impact on other users of the Ethernet. Adding a second pair of Sun-3/50s running the same test doubled the loading on the Ethernet, as you would expect. Current wisdom suggests that there should be no more than one file server and 8 diskless Sun-3s per Ethernet for good Ethernet performance when all the Suns are busy. At BRL, we presently have one Ethernet with 14 Sun-3/50s and and 4 Sun-2/50s running off of one fileserver (a Gould PN9050 giving both ND and NFS service), as well as a variety of other machines (more Goulds and 2 Alliant FX/8s) that communicate with NFS on a more occasional basis. We find that head contention on the file server is the performance limit now, not the Ethernet. However, once the filesever is beefed up a bit, the Ethernet will be next, so the Ethernet will be split into two, with a pair of level-3 IP gateways between them. Hope this information helps. Best, -Mike
jqj@GVAX.CS.CORNELL.EDU (J Q Johnson) (05/02/87)
Charles Hedrick notes that 20 to 40 diskless SUNs is a resonable load on an Ethernet. Although our experience at Cornell is consistent with this estimate, one should be a bit careful: small software and usage changes can make for big changes in behavior. For example, on our main Ethernet (about 25 diskless SUNs plus 75 other machines, total less than 25% load) we observe that at least 1/2 of the SUN load is ND traffic. ND is not efficient in its use of Ethernet bandwidth, and I would expect the total load offered by the SUNs to drop, perhaps precipitously, when SunOS4.0 arrives. Similarly, slightly better caching strategies in clients can make a big difference, as can adding a bit more memory (we do wish our 3/50s had 6MB!). Perhaps most important, don't attempt to generalize from diskless SUNs to PC-ATs (or even to diskless VAXstations). The PCs won't be paging across the network, don't run a multitasking OS, have typically smaller program sizes than Suns and longer process lifetimes, etc. All the above points to being able to support lots of diskless workstations on your network. On the other hand, it would be foolish to design a network that didn't make provisions for saturation. If you don't put in bridges or gateways initially, at least locate your servers near their clients so you can get the benefit of installing bridges later if you need to. Leave your PCs with a couple of empty slots so you can add more memory (for a RAM disk or whatever) later if need be. And so on. Don't assume that any load analysis you do today will still be valid in 1989.
dms@HERMES.AI.MIT.EDU (David M. Siegel) (05/04/87)
Here at the MIT AI Lab we have found that our diskless sun workstations put a much heavier load on an Ethernet than Hedrick and Johnson noted. Using a network analyzier, the 18 diskless Suns we have on one ether can run the cable at 50 percent of capacity for extended periods of time. Peek 5 second usage often jumps to 70 percent. All our machines have 8 Meg of RAM, though some of them run Sun Common Lisp. Much of the traffic is ND packets. Based on this, we are planning on having no more than 12-15 Suns are one ethernet. Each server will have its own "client" subnet.
hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (05/05/87)
Note that the 2-3 Mbits/sec of Ethernet traffic you report is with a test program designed to test the network only. However in actual use, the majority of the high-speed Ethernet traffic is generated by file serving. In that case, it is limited by the speed the disks, and the amount of lookahead done by the protocols. I would be extremely surprised to see the current generation of Sun file server deliver more than 1Mbit/sec of sustained throughput. Much to my surprise, I find that replacing Eagles with super-Eagles does not seem to increase the throughput available in my tests noticably. Note that these tests involved a mix of operations, including file creating, reading, removing, and renaming, and that the files were small or moderate in size. I.e. we tried to duplicate the sorts of I/O that a typical student mix would generate. I have to believe that fast sequential operations on large files would get more with a super-Eagle. Some other results: - one Eagle with one controller seemed to use about 2/3 of the CPU in a 3/180. - a second Eagle on the same controller added very little in throughput - a second Eagle on a second controller added about 50% in capacity. It seems that this was limited by CPU capacity - a 280 with super-Eagle did not have noticably more performance than a 180 with Eagle. However we assume that the 280 would be able to handle two disks and controllers without running out of steam. (We were unable to test this because we didn't have the right hardware configuration.) It's not clear whether this would be cost-effective, though, when compared against using one Sun 3/140S per disk [a configuration which however is not supported by Sun. Indeed I'm not sure that the 140S is even on the price sheet.]I wgets
mike@BRL.ARPA (Mike Muuss) (05/05/87)
I agree that the TTCP only measured memory-to-memory throughput. That was the intent -- to see how much data could be shoveled. I did not intend to suggest it was a generic benchmark. Note that TTCP was using TCP, mind you, not NFS or ND. In our environment, we do a lot of network-based 24-bit RGB graphics, which means whacking .75Mbytes (lores) or 3 Mbytes (hires) for each image. Often they are computed and displayed without ever touching a disk. So the TTCP test was not uninteresting. Our Gould 9000 fileserver, which serves the collection of Suns I mentioned, can be seen at busy times handling 200 packets/second in both the transmit and receive directions (peak). Many of them result in disk transactions, although the ratio can be deceptive. Eg, 1 pkt arrives asking for 8kbytes of data, which is read with one disk I/O, and returned in 8 packets. 1 disk I/O, 9 packets. Hope you find these random statistics of interest. Best, -M
jon@CS.UCL.AC.UK.UUCP (05/07/87)
The figures here are approx: Manchester University run a net of 60 odd suns. They have 10 diskless 3/50s per 3/260 server with a 400 Mb eagle. Each server-client set has it's own thin ethernet. All the servers are backboned on an ethernet. with 4Meg on each diskless client, 8 Meg on server, the servers and ethernet just about cope if no more than 5Meg virtual mem is used in each client (ie 1Meg swapping). I don't know whether the bottleneck is ethernet or server cpu/disk speeds. Most of the ether traffic is ND/NFS, which is much less a respecter of bandwidths and delays than tcp traffic, and wreaks havoc with bridges and gateways unless you handwind down the read/write transfer sizes. Hence the client/server ratio and separate ethers. Does anyone know of any affordable ethernet/ethernet IP gateway/subnet router that can take 8 Kbytes worth of IP back to back from several (~10) hosts at once? Jon
mishkin@apollo.uucp (Nathaniel Mishkin) (05/08/87)
I found all this discussion about loaded ethernets pretty interesting. Having used Apollos (both in and out of Apollo Computer Inc.) for the last ~5 years, I've become pretty familiar with the vices and virtues (much of the former) of token ring networks and often wondered why we wouldn't just be better off with ethernet. I think the recent discussion in this group highlights some of the virtues of token ring networks. I was fairly astonished to hear read one basically can run no more than (based on the various estimates) 8-15 diskless workstations (of some manufacture) on a single ether. I shudder to think of the cost (in money and performance) of *requiring* routers/bridges and internetwork topology for a relative small "work group". You just don't have these problems in a token ring. Token rings guarantee fair access to the medium and as a result can run successfully with consistently higher average loads. And forget diskless workstations for a minute. How about doing file system backups over the net? There's a fine bit of load; and it's not bursty like diskless workstations. In our multi-hundreds of gigabyte environment, backups (like love) are forever. I also thought the comment about how improved caching would help matters was interesting. Of course, proper caching requires correct cache validation to ensure that you're reading valid data. Not all distributed file systems implement such correctness guarantees. For example, Apollo's distributed file system does, but NFS doesn't. -- -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {wanginst,yale,mit-eddie}!apollo!mishkin
mark@mimsy.UUCP (Mark Weiser) (05/09/87)
In article <34bd5209.c366@apollo.uucp> mishkin@apollo.UUCP (Nathaniel Mishkin) writes: >...I think the recent discussion >in this group highlights some of the virtues of token ring networks. >I was fairly astonished to hear read one basically can run no more than >(based on the various estimates) 8-15 diskless workstations (of some >manufacture) on a single ether. I think this is a misinterpretation of the comments. I have seen Apollo networks exhibiting extremely poor performance when too many diskless nodes were accessing a single server. (Too many did not seem to be all that many--I saw this at the Brown demonstration classroom, when all the diskless clients were trying to start at once.) I think that the question is: what does it mean to 'run no more than...'. Sure you can run more than 8-15, but the performance will look worse. If you are used to a local disk, then you can 'feel' the decrement with more than 8-15 diskless workstations on the ethernet. On the other hand, if you are willing to accept low-performance transients (as the Brown folks evidently were on their Apollos during startup), then you can do more. Another angle: there are lots of reasons why performance could be different between these two systems. It is premature to point the finger at the 0/1 networking levels without more information. -mark -- Spoken: Mark Weiser ARPA: mark@mimsy.umd.edu Phone: +1-301-454-7817 After May 15, 1987: weiser@parcvax.xerox.com
connery@bnrmtv.UUCP (Glenn Connery) (05/11/87)
In article <34bd5209.c366@apollo.uucp>, mishkin@apollo.uucp (Nathaniel Mishkin) writes: > I was fairly astonished to hear read one basically can run no more than > (based on the various estimates) 8-15 diskless workstations (of some > manufacture) on a single ether... You just don't have these problems > in a token ring... Since you are not comparing equivalent systems this kind of interpretation of the results seems rather unwarranted. The discussion to date has pointed out that the Suns are doing paging of the virtual memory over the Ethernet. Depending upon the way things are set up this could be a huge load for the network to handle, regardless of the efficiency of the access protocol. -- Glenn Connery, Bell Northern Research, Mountain View, CA {hplabs,amdahl,3comvax}!bnrmtv!connery
mishkin@apollo.uucp (Nathaniel Mishkin) (05/11/87)
In article <6603@mimsy.UUCP> mark@mimsy.UUCP (Mark Weiser) writes: >In article <34bd5209.c366@apollo.uucp> mishkin@apollo.UUCP (Nathaniel Mishkin) writes: >>...I think the recent discussion >>in this group highlights some of the virtues of token ring networks. >>I was fairly astonished to hear read one basically can run no more than >>(based on the various estimates) 8-15 diskless workstations (of some >>manufacture) on a single ether. > >I think this is a misinterpretation of the comments. I have seen >Apollo networks exhibiting extremely poor performance when too many >diskless nodes were accessing a single server. I think there's some confusion here: *I* was not talking about the number of diskless workstations that could be booted off a single server. Maybe other people were. It seemed that people were talking about the number of diskless workstations that could be on a single local network (e.g. ether or ring). Further, let me make it clear that when I said I gave the range "8-15" I was merely quoting the numbers that had appeared in the earlier articles to which I was following up. (I.e. I should not be considered an authority on the performance characteristics of other manufacturer's workstations :) Unless I was misreading, these quotes were from articles that seemed to be discussing the number of diskless workstations per ether, not per disked server. I'll leave it to the real authorities to clear things up. >Another angle: there are lots of reasons why performance could be different >between these two systems. It is premature to point the finger at the >0/1 networking levels without more information. Fair enough. I was just trying to provide some more information that I thought was relevant. -- -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {wanginst,yale,mit-eddie}!apollo!mishkin
mishkin@apollo.uucp (Nathaniel Mishkin) (05/11/87)
In an earlier posting of mine, I unjustly sullied the capabilities of the NFS protocol in the area of caching. My cursory reading the the NFS Protocol Spec (which doesn't explicitly discuss caching issues) failed to catch the frequent "attributes" return parameters that one is, I take it, to use in cache management if one is to have an efficient NFS implementation. Open mouth; extract foot. -- -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {wanginst,yale,mit-eddie}!apollo!mishkin
jas@MONK.PROTEON.COM (John A. Shriver) (05/11/87)
We are looking at several effects here. One is server saturation proper-how fast its disks and protocols can run. The next is saturation of the server interface. The third is saturation of the LAN itself. All three are sensitive to the LAN technology. Server protocol performance can be effected relatively easily by LAN packet size. If you've got big packets (4K instead of 1.5K), you'll take less interrupts and context switches. Saturation of the server interface is to a great degree a matter of good design. Having enough buffering, a clean programming interface, and an ability to pipeline can definitely help receive/transmit more data. However, having any level of data link flow or congestion control can really help. Most CSMA networks have no way to know if a packet was really received at the server, or was dropped for lack of a buffer. Some CSMA networks (DEC's CI) do this, and it helps a lot. (Ethernet does not.) All of the Token-Ring networks (IBM's, our ProNET, ANSI's FDDI standard) have this, in the "frame copied" bit that comes back around from the recipient. This makes the possibility of lost packets due to server congestion dramatically lower, which really speeds things up. The data link can implement flow control & retransmission much faster than the transport code. The LAN itself can have dramatically different total capacity, which matters when you want 3 servers on one LAN, not just one. On 10 megabit networks, you can get more total data through, with less delay, on a Token-Ring than a CSMA/CD network. While vendors will disagree on where CSMA/CD congests terminally (somewhere between 4 and 7 megabits/second), it is true that Token-Ring can really deliver all 10 megabits/second. Moreover, at speeds beyond 10 megabits/second, CSMA/CD does not scale, and you almost have to go Token-Ring. (You can go CSMA/CA, but it can degenerate into a Token-Bus.) The FDDI standard is a Token-Ring, as is the ProNET-80 product.
mishkin@apollo.UUCP (05/11/87)
[[This is a reposting of my response to Mark Weiser's article. This one is slightly different from my earlier one. If I spend any more time trying to figure out how to (successfully) cancel a previously posted article, I think I'll go insane. Sorry for the noise. --mishkin]] In article <6603@mimsy.UUCP> mark@mimsy.UUCP (Mark Weiser) writes: >I think this is a misinterpretation of the comments. I have seen >Apollo networks exhibiting extremely poor performance when too many >diskless nodes were accessing a single server. I think there's some confusion here. *I* was talking about the number of diskless workstations per ethernet, not per server. I thought that's what other people were talking about too. Further, I want to make clarification: When I referred to 8-15 as being the maximum number of diskless workstations (per ethernet), I was *merely* quoting the numbers that appeared in the articles to which I was following up. (I.e. I don't claim to be an expert on the performance characteristics of other manufacturers' equipment.) I'll let the real experts clear things up. >Another angle: there are lots of reasons why performance could be different >between these two systems. It is premature to point the finger at the >0/1 networking levels without more information. Climbing further out of the hole which I seem to have been digging myself into: I agree with you. I was not trying to make a definitive comparison between rings and ethers. I was simply trying to add some more information to the discussion. A number of people here (at Apollo) have said to me "Come on, this really can't be an ethernet saturation problem." Others have extolled ring networks in even other ways that I can barely understand. Finally, before I shrink away, I feel obliged to point out that lest anyone get the wrong impression, Apollo believes that both ring and ether networks are fine ideas. These days, one can buy Apollo's DN3000s with either or both of a ring or ethernet controller and all your Apollo workstations can communicate (and share files) over complex ring/ether/whatever internetwork topologies. -- -- Nat Mishkin Apollo Computer Inc. Chelmsford, MA {wanginst,yale,mit-eddie}!apollo!mishkin