martin@felix.UUCP (Martin McKendry) (09/10/87)
* Given that there are many different disk formats and access algorithms possible for a Unix file system, how do you decide what improvements are best for a given system? And having decided how to invest your engineering dollars, how do you quantify the improvements you have made? Are there any standard tests or benchmarks that can be run? For example, what is the answer to the question: "How much better is the 4.2 file system than 4.1?". Note that "lots" is kind of a weak response. Are there filesystem Dhrystones & Whetstones? Of course, a single application cannot give the whole picture -- you need to know how things compare with multiple concurrent users beating on unrelated files. None of this "we can make our Unix faster than you can make yours". Its also interesting to consider how you would compare the performance of two file systems without factoring in CPU speeds, even though the machine being compared use different CPU's at different speeds. Maybe you would want to factor out channel speeds also. Even if you view the whole machine as indivisible, its non trivial. A very high proportion of programs today are I/O bound -- a proportion that will increase as we get faster processors. It seems to me that filesystem performance is the next big area for competition. After all, that's what makes a mainframe a mainframe, right? Comments? -- Martin S. McKendry; FileNet Corp; {hplabs,trwrb}!felix!martin Strictly my opinion; all of it
hammond@faline.UUCP (09/12/87)
In article <> martin@felix.UUCP (Martin McKendry) writes: > > ... A >very high proportion of programs today are I/O bound -- a proportion >that will increase as we get faster processors. It seems to me that >filesystem performance is the next big area for competition. After >all, that's what makes a mainframe a mainframe, right? > >Comments? About 98% of the programs run on our systems use <2 secs of 780 CPU time, nor do they use very much I/O. There are only a few I/O or CPU hogs. That's based on ~10 million process records using modified 4.2 BSD accounting. Where do you get the idea that a high proportion are I/O bound? On machines with 32+ MB of memory, I'm willing to bet that a large proportion of all accesses are satisifed from the in core buffers, i.e. my edit compile run edit cycles probably all run out of the in core buffers once I've completed a cycle. If the system were smart enough to use all of memory as disk buffer rather than 10% of it, I'm certain that my stuff would just stay in core. I'll agree that file system performance could be improved, but I'm inclined to believe that improving the use of main memory as buffers would be a bigger general win than any changes to the disk layout. Does anybody have records for their general use systems that prove that the systems are I/O bound? I want at least a continuous month's worth of records, no one or two day or "peak" samples. Rich Hammond Bell Communications Research hammond@bellcore.com
ron@topaz.rutgers.edu (Ron Natalie) (09/14/87)
If you have a large proportion of short lived programs, wouldn't you say that you spend a lot of time just bringing the program in from disk. -Ron
martin@felix.UUCP (Martin McKendry) (09/15/87)
In article <1384@faline.bellcore.com> hammond@faline.UUCP (Rich A. Hammond) writes: >In article I wrote: >> >> ... A >>very high proportion of programs today are I/O bound -- a proportion >>that will increase as we get faster processors. It seems to me that >>filesystem performance is the next big area for competition. After >>all, that's what makes a mainframe a mainframe, right? >> >>Comments? > >About 98% of the programs run on our systems use <2 secs of 780 CPU time, >nor do they use very much I/O. There are only a few I/O or CPU hogs. >That's based on ~10 million process records using modified 4.2 BSD >accounting. Where do you get the idea that a high proportion are I/O bound? > From extensive workload analysis. Try putting in a make. Look at your CPU utilization. If its not 100%, you are waiting on I/O when you could be processing. Depending on how much you like to wait on I/O, you are I/O bound. To look at a single 780 is hardly representative of the world. Most of the world's data processing is production commercial data processing. We do image processing. Don't assume that your load is everyone's. In a previous life, I worked with extensive analyses of commercial customer workloads taken from real customers sites. Based on simulation results and real benchmarks, we found that you could make changes by large factors (2-5) in either direction in CPU performance without seeing anything like the same change in throughput (total time to run benchmark). Like a factor of 4 or 5 faster in CPU for only a factor of 2 change in throughput. Idle time on the faster CPU goes up as expected. This on batch processing with no terminal I/O. If that's not I/O bound, I don't know what is. Since CPU speed/$ is improving at a faster rate than the corresponding figure for disk, I'd expect the class of problems for which this occurs to increase. >On machines with 32+ MB of memory, I'm willing to bet that a large >proportion of all accesses are satisifed from the in core buffers, >i.e. my edit compile run edit cycles probably all run out of the in >core buffers once I've completed a cycle. If the system were smart >enough to use all of memory as disk buffer rather than 10% of it, I'm >certain that my stuff would just stay in core. > What if I want to support 400 users from one server, each of whom wants 50Kb of data every 15 seconds. Or if I have to process/merge two or three 60Mb data files? What if I don't want to ship 32 M on all machines? >I'll agree that file system performance could be improved, but I'm >inclined to believe that improving the use of main memory as buffers >would be a bigger general win than any changes to the disk layout. > What if I am planning to do both, and the incremental costs are worth it? >Does anybody have records for their general use systems that prove *********** By whose definition? >that the systems are I/O bound? I want at least a continuous month's >worth of records, no one or two day or "peak" samples. ***** Why not? Often its the peaks I want to handle. I can already handle the regular loads. I don't care for your tone. I don't think my posting warranted it. >Rich Hammond Bell Communications Research hammond@bellcore.com -- Martin S. McKendry; FileNet Corp; {hplabs,trwrb}!felix!martin Strictly my opinion; all of it
aeusesef@csun.UUCP (09/16/87)
In article <7327@felix.UUCP> martin@felix.UUCP (Martin McKendry) writes: >In article <1384@faline.bellcore.com> hammond@faline.UUCP (Rich A. Hammond) writes: >>In article I wrote: >>> ... A >>>very high proportion of programs today are I/O bound -- a proportion >>>that will increase as we get faster processors. It seems to me that >>>filesystem performance is the next big area for competition. After >>>all, that's what makes a mainframe a mainframe, right? >> >>About 98% of the programs run on our systems use <2 secs of 780 CPU time, >>nor do they use very much I/O. There are only a few I/O or CPU hogs. >>Where do you get the idea that a high proportion are I/O bound? >> >(some stuff about programs and cpu load) One of the I/O hogs happens to be the operating system itself (or do you only support 1 user and 1 taks?). The example I'm always being given (and have started to give myslef 8-)) is a Cyber. A Cyber 170/760 is a FAST machine (8-10 M{I,FLO}PS [yeah, they're pretty much the same]), which also happens to have *VERY* fast I/O (on the order of megawords a second, I forget just how many). (It does this by having a seperate I/O processor, which handles all i/o: the cpu just tells it what it wants.) There is also a Cyber 180/830, also a very fast machine. It, however, gets only 1 M{I,FLO}PS (they added microcode 8-(). Therefore, it gets about the same instruction speed as a VAX (more or less, I won't quibble too much). However, it can support, nicely, about 30 or 40 users (moderately nicely; it starts to slow down at about 10 to 15), whereas a VAX (this is a 780 equivalent machine, more or less) dies at that many. Reason? Jobs rolling in and out of memory use up a lot of i/o bandwidth. >Based on simulation >results and real benchmarks, we found that you could make changes >by large factors (2-5) in either direction in CPU performance >without seeing anything like the same change in throughput (total >time to run benchmark). Like a factor of 4 or 5 faster in CPU >for only a factor of 2 change in throughput. Idle time on the faster >CPU goes up as expected. This on batch processing >with no terminal I/O. If that's not I/O bound, I don't know what is. >>(some stuff about large memories) (32M large? I use 96 myself...) >>If the system were smart >>enough to use all of memory as disk buffer rather than 10% of it, I'm >>certain that my stuff would just stay in core. Unfortuneately, there's two problems: 1) Processes tend to use this memory themselves, for code/data. Sure, you can swap (demand paged vm), but that doesn't seem much more efficient than just using the memory to hold the code. 2) That data has to be written to disk SOMETIME. I think I would rather put up with slow I/O than to have to worry about the machine corrupting my data. This can be cured if you have large memory AND a seperate I/O processor (buffer, while the CPU is busy, have the iop write the data), but I'm not too sure that that is done too often. >> >(some stuff about large number of users/data transfers) >>I'll agree that file system performance could be improved, but I'm >>inclined to believe that improving the use of main memory as buffers >>would be a bigger general win than any changes to the disk layout. This Cyber I'm talking about (830) has roughly 16Mbytes of main memory. It tends to use the memory to store jobs, and it gets better performance that way then our 760 (with an equivelant load [5 or 6 times as many users]) which can only have 256KWords and has to roll jobs into/outof memory. >> >What if I am planning to do both, and the incremental costs are worth it? >>Rich Hammond Bell Communications Research hammond@bellcore.com >Martin S. McKendry; FileNet Corp; {hplabs,trwrb}!felix!martin >Strictly my opinion; all of it The Cybers are 20+ years old; Cray had the right idea when he designed them. (But I HATE NOS!) ----- Sean Eric Fagan Office of Computing/Communications Resources (213) 852 5742 Suite 2600 1GTLSEF@CALSTATE.BITNET 5670 Wilshire Boulevard Los Angeles, CA 90036 {litvax, rdlvax, psivax, hplabs, ihnp4}!csun!aeusesef
hammond@faline.bellcore.com (Rich A. Hammond) (09/16/87)
Martin@felix.UUCP (Martin McKendry) responsed to my comments on his original posting about file-system performance. First, I must apologize for the tone of my article, I didn't mean to offend Martin. >>> Martin claimed a high proportion of jobs were I/O bound. >>I asked: ... Where do you get the idea that a high proportion are I/O bound? >> >From extensive workload analysis. Try putting in a make. Look at >your CPU utilization. If its not 100%, you are waiting on I/O when you >could be processing. Depending on how much you like to wait on I/O, >you are I/O bound. This isn't realistic, given that disks have both a seek and rotational delay, the only way to get rid of ALL disk I/O time for a single job is to prefetch the data into main memory. Only if you can predict what file I want before I ask for it can you have 100% CPU utilization. If you can do that, you can make a lot of money in the stock market. :-) >To look at a single 780 is hardly representative of the world. Most >of the world's data processing is production commercial data processing. >We do image processing. Don't assume that your load is everyone's. I agree, but I thought we were talking about UNIX file system performance. >In a previous life, I worked with extensive analyses of commercial >customer workloads taken from real customers sites. Based on simulation >results and real benchmarks, we found that you could make changes >by large factors (2-5) in either direction in CPU performance >without seeing anything like the same change in throughput (total >time to run benchmark). Like a factor of 4 or 5 faster in CPU >for only a factor of 2 change in throughput. Idle time on the faster >CPU goes up as expected. This on batch processing >with no terminal I/O. If that's not I/O bound, I don't know what is. >Since CPU speed/$ is improving at a faster rate than the corresponding >figure for disk, I'd expect the class of problems for which this >occurs to increase. No terminal I/O - were these UNIX systems? I am quite willing to concede that UNIX and "real, commercial data processing" aren't the same. I'm not sure that we want them to become the same. I can show you UNIX systems where doubling the CPU speed doubles throughput. But neither of our anecdotes should be generalized to the whole world. Regarding my claims that larger main memory will help, Martin replies: >What if I want to support 400 users from one server, each of whom >wants 50Kb of data every 15 seconds. Or if I have to process/merge two >or three 60Mb data files? What if I don't want to ship 32 M on all >machines? I'll agree, those could benefit from more I/O bandwidth. But do they need to be done under UNIX? Wouldn't a dedicated OS to handle the disks and communications work better in the first case, even if the clients were UNIX systems? Merging files might better be left to IBM MVS? As for shipping 32 M on all systems, this is a tradeoff between your development time and the incremental memory cost * # systems shipped. If you only ship a few systems and the 32 M solves the problem adequately, you'd be better off sticking it in. Is the first one a real situation? Regarding my claim that improving the use of main memory for buffering would help Martin pointed out that he could do both that and disk layout. I said that improving the buffering would have a better payoff, so that's what I would look at first. It wasn't clear from Martin's note that he had already considered it. Regarding my asking for long term measurements of I/O demand not peak measurements Martin said: >Why not? Often its the peaks I want to handle. I can already handle the >regular loads. I'm looking for the largest average payoff, which I perceive as the regular loads. Working to alleviate the peaks may not gain you much if the result has little effect on regular loads. For example, if the regular load runs pretty much out of the in-core buffer pool and the only large amounts of I/O are the peaks, then you may not save your customer much per $ of development time to spend man-months or man-years reworking the I/O. Engineering is a tradeoff, I was saying that you have to know your work load and tailor your efforts to extract the greatest gain. Martin made a claim "that the vast majority of jobs were I/O bound" which I didn't think justified in the UNIX environment. His reply to my comments indicated that he wasn't thinking of the UNIX environment and that he had specific applications in mind. Fine, but I thought that we were interested in improving UNIX in general not Martin's product in particular. In that context I claimed that there are other things that might give a better payoff. Don't take this to mean that I'm against file system I/O improvements, I'll welcome any that come along. In summary, we were talking at cross purposes and I apologize for the resulting bad feelings. Rich Hammond hammond@bellcore
rbl@nitrex.UUCP (09/19/87)
In article <14704@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes: >If you have a large proportion of short lived programs, wouldn't >you say that you spend a lot of time just bringing the program in >from disk. > >-Ron And a lot of time bringing in the Unix utility programs that the applictions may use... Sugit Kumar did his Ph.D. dissertation at Case Western Reserve (about 7 yr. ago) looking at the role of solid-state disks (experiments were done on a PDP-11/45) in UNIX performance. The best speed-ups were by allocating one SSD to the system and utility programs and another SSD to /usr/tmp. If you're working against the same data files time and time again in the application, copy it to SSD (/usr/tmp). The speed up could theoretically be about 17,000 fold, but the device drivers place a lower bound on the access time. What does this mean about file system performance? ... simply that the overhead of the device driver and file system itself drops proportionately if larger block sizes are used. One trade off is "wasted" disk space for smaller files. Rob Lake -- Rob Lake {decvax,ihnp4!cbosgd}!mandrill!nitrex!rbl
dwc@homxc.UUCP (D.CHEN) (09/21/87)
> >>> Martin claimed a high proportion of jobs were I/O bound. > >>I asked: ... Where do you get the idea that a high proportion are I/O bound? > >> > >From extensive workload analysis. Try putting in a make. Look at > >your CPU utilization. If its not 100%, you are waiting on I/O when you > >could be processing. Depending on how much you like to wait on I/O, > >you are I/O bound. > > This isn't realistic, given that disks have both a seek and rotational > delay, the only way to get rid of ALL disk I/O time for a single job is > to prefetch the data into main memory. Only if you can predict what > file I want before I ask for it can you have 100% CPU utilization. > If you can do that, you can make a lot of money in the stock market. :-) actually, from a bottleneck point of view, only SYSTEMs can be I/O bound, not workloads. although one job may always have i/o delay, multiprocessing can allow the system to run without i/o delay. > > I'll agree, those could benefit from more I/O bandwidth. But do they > need to be done under UNIX? Wouldn't a dedicated OS to handle the > disks and communications work better in the first case, even if the > clients were UNIX systems? Merging files might better be left to IBM MVS? > As for shipping 32 M on all systems, this is a tradeoff between your > development time and the incremental memory cost * # systems shipped. > If you only ship a few systems and the 32 M solves the problem adequately, > you'd be better off sticking it in. Is the first one a real situation? > why not do them under UNIX? i'm sure that the other "special" OSs must have also evolved under workload analysis. danny chen homxc!dwc
scc@cl.cam.ac.uk (Stephen Crawley) (10/03/87)
In article <7075@felix.UUCP> martin@felix.UUCP (Martin McKendry) writes: >Given that there are many different disk formats and access >algorithms possible for a Unix file system, how do you decide >what improvements are best for a given system? And having decided >how to invest your engineering dollars, how do you quantify >the improvements you have made? [...] Bob Hagman of Xerox PARC has done some very interesting work in this area. I believe that he will be presenting a paper at this year's SOSP conference on his redesign of the Cedar file system. I have mislaid my copy of his paper, so I'll say no more except that it is very pertinent. -- Steve