craig@BBN.COM (Craig Partridge) (07/26/90)
I'm curious. Has anyone done research on building extremely fast file systems, capable of delivering 1 gigabit or more of data per second from disk into main memory? I've heard rumors, but no concrete citations. I'm interested because I think we'll need such fast file systems as we build distributed systems over gigabit networks, and I'm somewhat curious to learn what, if anything, has been done so far in this area. Craig Partridge craig@bbn.com % % There is the RAID project at Berkeley, and some disk array work going on at % IBM Almaden. There is also Michael Scott's work at Rochester. --DL %
solworth@uicbert.eecs.uic.edu (Jon Solworth) (07/27/90)
Saving a gigabit/second to disk is going to take a lot of disks. If a disk can write at 6 MB/sec, the 20 disks are needed just to accept at network speeds (this assumes that the disks are essentially doing a totally sequential access). Add in any kind of random seeks, and the number of disks can skyrocket. In addition to Berkeley RAID and Sprite projects (which essentially use disk arrays as striped disks), disk caching of writes is an alternative way of getting near sequential access rates. (C. Orji and I have a paper in the 1990 Sigmod entitled "Write-only disk caching") Jon Solworth UIC
lm@snafu.Eng.Sun.COM (Larry McVoy) (07/28/90)
In article <5465@darkstar.ucsc.edu> craig@BBN.COM (Craig Partridge) writes: > >I'm curious. Has anyone done research on building extremely >fast file systems, capable of delivering 1 gigabit or more of data >per second from disk into main memory? I've heard rumors, but no >concrete citations. > >I'm interested because I think we'll need such fast file systems as >we build distributed systems over gigabit networks, and I'm somewhat >curious to learn what, if anything, has been done so far in this area. Why not: Drives --------------- Let's approximate a gigabit by 107 megabytes. Let's assume that a nice drive rotates at 7200 RPM, and has 64KB / track, and has the ability to read two heads at once (this is about twice as good as any commonly available drive such as SCSI, IPI, or XD). Let's do some math: 7200 revs / minute = 120 revs / sec = 8.33 milliseconds / rev. 64KB * 2 heads * 120 revs / sec = 15,360 KB / sec. This means that the most that you can expect from a drive like this is 15MB per second. This assumes that you have a filesystem that can run the drive at the platter speed (which isn't such a bad assumption - I run SCSI's at the platter speed with a hacked version of UFS). These numbers are wildly optimistic. I worked on super computer drives a couple of years ago and they could do 12MB / sec and drives cost about $75K each. I think you'll see cheap 2MB / sec drives in a year or two. It will be a long time before you see cheap 15MB / sec drives. Why not: Busses --------------- A good bus these days runs at about 80 MB / sec flat out. We can make them faster but it gets harder and harder to do so and give them any size. Why not: CPU speed ------------------ A Sun 4/490 is a reasonably fast machine. Moving I/O requires a copy. The 490 has copy hardware that maxes out at 25 MB / sec in the kernel and 14 MB / sec in user space. Conclusion: ----------- I think the answer is (a) yes, we've thought about it but (b) no it won't happen with any conventional hardware soon. I suspect that you'll need parallel busses, CPU's, and disks in order to get that kind of I/O. Furthermore, the I/O requests have to be large (megabytes) in orer to get all those parts working at the same time. You'll *never* see a magnetic disk that deliver > MB / sec timed over 10K. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
eugene@wilbur.nas.nasa.gov (Eugene N. Miya) (07/30/90)
As a start, consider the Cray SSD (Solid State Disk). It gets over 1 GB/S. --e. nobuo miya, NASA Ames Research Center, eugene@orville.nas.nasa.gov {uunet,mailrus,other gateways}!ames!eugene
craig@BBN.COM (Craig Partridge) (07/31/90)
In article <5512@darkstar.ucsc.edu> lm@snafu.Eng.Sun.COM (Larry McVoy) writes: > >Why not: CPU speed >------------------ > >A Sun 4/490 is a reasonably fast machine. Moving I/O requires a copy. >The 490 has copy hardware that maxes out at 25 MB / sec in the kernel >and 14 MB / sec in user space. Larry: I've heard the discussions about busses and disk drives before but this is the first time someone's said CPUs will be a problem. Mostly I've heard the reverse argument -- CPUs are gonna gobble data as fast as the network and disks can feed it. For example, several researchers are muttering about 250 MIP CPUs in the next couple of years -- one person I know at DEC is talking about a 1 BIP workstation by 1995. Those CPUs will have modest memory caches that run at CPU speed -- so close to the CPU you'll have a system chomping on gigabits of data per second (consider a 32-bit instruction with one 32-bit memory operand -- thats 250 MIPS * 64 bits = 16 gigabits/second of data flowing through the CPU -- and that's clearly low [I haven't factored where the operands contents go]). So I think CPUs will be capable of moving gigabits around. Craig
davecb@nexus.yorku.ca (David Collier-Brown) (07/31/90)
In <5465@darkstar.ucsc.edu> Craig Partridge writes: | I'm curious. Has anyone done research on building extremely | fast file systems, capable of delivering 1 gigabit or more of data | per second from disk into main memory? I've heard rumors, but no | concrete citations. puder@zeno.informatik.uni-kl.de (Arno Puder) writes: | Tanenbaum (ast@cs.vu.nl) has developed a distributed system called | AMOEBA. Along with the OS-kernel there is a "bullet file server". | (BULLET because it is supposed to be pretty fast). | | Tanenbaum's philosophy is that memory is getting cheaper and cheaper, | so why not load the complete file into memory? This makes the server | extremely efficient. Operations like OPEN or CLOSE on files are no | longer needed (i.e. the complete file is loaded for each update). Er, sorta... You could easily write an interface that did writes or reads without open or closes, for some specific subset of uses. | | The benchmarks are quite impressive although I doubt that this | concept is useful (especially when thinking about transaction | systems in databases). Well, I have something of the opposite view: a system like Bullet makes a very good substrate for a database system. The applicable evidence is in the article "Performance of an OLTP Application on Symmetry Multiprocessor System", in the 17th Annual International Symposium on Computer Architecture, ACM SigArch Vil 18 Number 2, June 1990. (see, a reference (:-)) The article uses all-in-memory databases in the TP1 benchmark as a limiting case while investigating the OS and architectural support that are necessary for good Transaction Processing speeds, and the speeds are up in the range that Craig may find interesting... My speculation is that a bullet-like file system with a relation- allocating layer (call it the Uzi filesystem? the speedloader filesystem??) on top would make a very good platform for a relational database. Certainly the behavior patterns of an in-memory, load-whole-relation database would be easy to reason about, and would be easy and interesting to investigate. | You can download Tanenbaum's original paper (along with a "complete" | description about AMOEBA) via anonymous ftp from midgard.ucsc.edu | in ftp/pub/amoeba. --dave -- David Collier-Brown, | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or 72 Abitibi Ave., | {toronto area...}lethe!dave Willowdale, Ontario, | "And the next 8 man-months came up like CANADA. 416-223-8968 | thunder across the bay" --david kipling
aglew@oberon.crhc.uiuc.edu (Andy Glew) (07/31/90)
>>Why not: CPU speed >>------------------ >> >>A Sun 4/490 is a reasonably fast machine. Moving I/O requires a copy. >>The 490 has copy hardware that maxes out at 25 MB / sec in the kernel >>and 14 MB / sec in user space. > >Larry: > > I've heard the discussions about busses and disk drives before >but this is the first time someone's said CPUs will be a problem. Actually, it's not CPU speed, but the CPU-memory interface that's the problem. The CPU-memory interface is increasing in speed, but nowhere near as fast as CPUs. Caches do not help large copies from I/O to user, if I/O is uncached (even if cached it can be a problem, with a single data port on the cache). Burst protocols and wider busses seem to be the favoured solutions. -- Andy Glew, andy-glew@uiuc.edu Propaganda: UIUC runs the "ph" nameserver in conjunction with email. You can reach me at many reasonable combinations of my name and nicknames, including: andrew-forsyth-glew@uiuc.edu andy-glew@uiuc.edu sticky-glue@uiuc.edu and a few others. "ph" is a very nice thing which more USEnet sites should use. UIUC has ph wired into email and whois (-h garcon.cso.uiuc.edu). The nameserver and full documentation are available for anonymous ftp from uxc.cso.uiuc.edu, in the net/qi subdirectory.
narten@cs.albany.edu (Thomas Narten) (07/31/90)
In article <5555@darkstar.ucsc.edu> craig@BBN.COM (Craig Partridge) writes:
I've heard the discussions about busses and disk drives before
but this is the first time someone's said CPUs will be a problem.
Take a look at John Ousterhout's paper "Why Aren't Operating Systems
Getting Faster As Fast as Hardware?" in the June USENIX proceedings.
He reports on a number of benchmarks, and one of his conclusions is
that memory bandwidth is not keeping up with processor speed in RISC
machines.
--
Thomas Narten
narten@cs.albany.edu
craig@BBN.COM (Craig Partridge) (08/01/90)
In article <5582@darkstar.ucsc.edu> narten@cs.albany.edu (Thomas Narten) writes: > >In article <5555@darkstar.ucsc.edu> craig@BBN.COM (Craig Partridge) writes: > I've heard the discussions about busses and disk drives before > but this is the first time someone's said CPUs will be a problem. > >Take a look at John Ousterhout's paper "Why Aren't Operating Systems >Getting Faster As Fast as Hardware?" in the June USENIX proceedings. >He reports on a number of benchmarks, and one of his conclusions is >that memory bandwidth is not keeping up with processor speed in RISC I've read Ousterhout's paper and don't disagree with it (as far as I recall). My sense however is that even though memroy is getting faster slower than CPUs, when we look at gigabit computing, CPU and memory speeds, while a nuisance, aren't gonna be problems in the same league as busses or disks. Craig
lm@snafu.Eng.Sun.COM (Larry McVoy) (08/01/90)
In article <5555@darkstar.ucsc.edu> craig@BBN.COM (Craig Partridge) writes: > > I've heard the discussions about busses and disk drives before >but this is the first time someone's said CPUs will be a problem. > > Mostly I've heard the reverse argument -- CPUs are gonna gobble data >as fast as the network and disks can feed it. For example, several researchers >are muttering about 250 MIP CPUs in the next couple of years -- one person I >know at DEC is talking about a 1 BIP workstation by 1995. > > Those CPUs will have modest memory caches that run at CPU speed -- >so close to the CPU you'll have a system chomping on gigabits of data >per second (consider a 32-bit instruction with one 32-bit memory operand > -- thats 250 MIPS * 64 bits = 16 gigabits/second of data flowing through >the CPU -- and that's clearly low [I haven't factored where the operands >contents go]). So I think CPUs will be capable of moving gigabits around. You have to feed the cache. The question was "are there or will there be file systems capable of gigabit transfer rates?" (paraphrased). When you are talking about I/O rates, you can forget the CPU cache - first of all, the data won't be there to start, and second of all, it doesn't get reused; it has to be fetched from memory, network, disk, wherever. You have to look at the whole path: disk network disk controller network controller bus bus memory memory cpu cpu and back. That path isn't likely to do gigabit any time soon. Sure, you can make CPU's that do it, and you can make memory that can do it, and you can make a bus that can do it, etc. But you have to have all of them together. It's very similar to buying a stereo system. You don't go out and buy nine zillion dollars of equipment and use extension cords as speaker wire. This is the classic application of Amdahl's law to performance. Whenever you fix one bottleneck, the system improves a little and then hits a different one. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
fouts@unix.sri.com (Martin Fouts) (08/10/90)
In article <5512@darkstar.ucsc.edu> lm@snafu.Eng.Sun.COM (Larry McVoy) writes: In article <5465@darkstar.ucsc.edu> craig@BBN.COM (Craig Partridge) writes: > >I'm curious. Has anyone done research on building extremely >fast file systems, capable of delivering 1 gigabit or more of data >per second from disk into main memory? I've heard rumors, but no >concrete citations. > >I'm interested because I think we'll need such fast file systems as >we build distributed systems over gigabit networks, and I'm somewhat >curious to learn what, if anything, has been done so far in this area. [Good tecnical shootdown removed. Summary of removed material: It's too expensive/difficult to do with existing technology ] As part of the contract for the first Cray 2 at NASA Ames, CRI was required to statisfy a 10mbyte/sec per drive demonstration of a simple file transfer which copied the entire contents of a source drive to a destination drive. They were required to demonstrate 20 simultaneous transfers (using 40 drives.) Doing the math, 20x10x8 (instances*rate*bits/byte) = 1.6 Gigabits/second. They ran that demo for me at Ames five years ago. The programs were written in C and made ordinary read/write calls on files open in ordinary ways. (I've run the identical source code on a huge number of Unix file systems.) Tim Hoel et. al. at CRI had designed a fast file system for the Cray 2, which is in production use, and was used for this test. With a machine like the 2, the file system required very little cleverness to pass this test. Had the test been rewritten to use striping, it could have been accomplished with a single transfer on a single file system. BTW, that was a >$20M Cray 2 using fast expensive disk drives. It was also running a compute bound workload while running the copy test. We aren't going to see a Gb/s file system on a PC clone in the near future, but there are a number of mainframes capable of sufficient aggrate disk performance now. We've again reached the point where high performance I/O is a bigger bottleneck than CPU horsepower. -- Martin Fouts UUCP: ...!pyramid!garth!fouts (or) uunet!ingr!apd!fouts ARPA: apd!fouts@ingr.com PHONE: (415) 852-2310 FAX: (415) 856-9224 MAIL: 2400 Geng Road, Palo Alto, CA, 94303 Moving to Montana; Goin' to be a Dental Floss Tycoon. - Frank Zappa