eugene@ames.UUCP (Eugene Miya) (09/02/85)
<29898@lanl.ARPA> <1517@peora.UUCP> <30105@lanl.ARPA> <1062@sdcsvax.UUCP> Thank you for your patience. I have mulled some of this over, and I have some incomplete thoughts if you bear with me. I have some preliminary research, some of it graphical on the Cray-1/S, X-MP, 2, Cyber 205, Convex C-1, and ELXSI, and if I could find a VAX or a 68000 with a good clock, I would do them, too. I think science proceeds of three phases: Detection [note: Boolean], Lots of qualitative information [hunchs] Identification [note: maybe enumerations,classes,sets,types] Detailed Analysis [more complex mathematics: stat, etc.] I would like to summarize some of the issues posted so far. I mentioned "atomicity" in the first posting. Atomic for two reasons: small and low-level and indivisibility. Scalability seems to be another [too many ilities] concern. A variation of this is termed "realistic" by some. [My comments indented from here.] > Message-ID: <29898@lanl.ARPA> > Another metric is a sampling of large, mostly unmodifiable, commercial > codes. My preference is MSC/Nastran (.5e6+lines of finite element code). > It runs on a surprising number of machines, and is a rigorous test of not > only performance (cpu and io) but the scientific/engineering environment > available to the average user. > george spix gas@lanl [Portability, generality] I am surprised no one objected. The problem with this is the behavior of large programs like this are not well understood. Sure you can take a time, you can measure core usage, and so on, but it doesn't generalize, and the code still has problems, bugs, desired new features, otherwise MSC would not still be in business [I'm stretching this one, I know.] This brings up the issue of analytic method: [mentioned below] separability of components. We have few measures of I/O and if we add two things to together, we get a sum. One problem with computer science is that we have poor empirical and laboratory skills and tools. [I am working on a series of TRs on experiment design in CS, but my management wants me to work on their stuff.] On one hand people aren't mathematical enough in their analysis, yet we tend to lack a lot of `controls' when taking measurements [too mathematical]. There seem to be gaps between those who do queueing models, those who insist on measuring the near atomic qualities of systems, and letters like the above insisting on 'realism.' Do these issues scale? >Message-ID: <1512@peora.UUCP> > >> Whetstones were mentioned in another letter, but the only people who use >> these are computer manufacturers. >This statement isn't true. For example, back when I was a graduate-student >researcher in computer architectures, we used the Whetstones to test our >vertical-migration software. I would Still place you in that category. Whetstones are rather arbitrary and I did some work to locate the original book (not a paper) on ALGOL. Our problems have a greater degree of heterogeneity as George points out above. >> What qualities do our performance metrics need to have? > >I think you need to make your performance measurements in such a way that >you get a set of distinct numbers which can be used analytically to determine >performance for a given program if you know certain properties of the >program. For example: > >1) The rate of execution of each member of the set of arithmetic operations >provided by the machine's instruction set, ... >..., with cache disabled. > >2) The rate of execution of 1-word memory-to-memory moves, with cache >disabled. > >3) The rate of execution of a tight loop ...register-to-register >moves, with cache disabled. > >4) The rate of execution of a tight loop ... , with cache enabled. > >5) The rate of execution of a tight loop performing (same word size as #3 >and #4 above) memory-to-memory moves that produce all cache "hits", with >cache enabled. Note that this gives you two properties of your cache: your >speedup for operand fetch and store resulting from caching, and any >performance penalties resulting from a write-through vs. write-back cache. > >6) Specifications such as the number of registers available to the user, >the size of the cache, etc. > >Well, you get the idea, anyway... personally I tend to feel that statistical >performance measurements are not nearly as useful as analytical ones; I >would rather see a list of fairly distinct performance properties of a pro- >cessor anytime, since I think you can do more with them in terms of >saying how the machine will perform for a given application that way. I agree with you here, but why do I say that? >I separated out the various forms of caching (operations in registers, and >use of a cache between the CPU and the primary memory) because so many >people "fudge" their results that way without giving any information from >which you can determine real performance. The above list is just meant to >suggest "qualities" rather than being an exhaustive list; i.e., that the >performance metrics should reveal (rather than hide) the set of factors >that actually influence performance. [Unfortunately, this would never suit >most marketing organizations nor customers, since they want an all- >encompassing number.] Sad but true. The all-encompassing number is a problem. I would like to do a bit in functional analysis: principally vector valued measures, but Alan Smith at Berkeley suggested staying with measures as simple as possible. [I agree in principle.] But I want to consider them [Aan suggest factor analysis at most]. >The metrics should also be compiler-independent. >Shyy-Anzr: J. Eric Roskos How do you convince of compiler independent? What about compiler dependent? What tests which determine compiler characteristics? Limitations? OS independent, too. A problem with uniprocessor architectures, computers, and operating systems is that all are constructed in such as way as to make measurement difficult. Taking a measurement affects the thing being observed. High-level measurement concepts are lacking instead we measure oscilloscope pulses and say, so many x's were transfered only because we know how many x's there were to begin with. > Message-ID: <1517@peora.UUCP> > > It helps draw the line between special purpose and general purpose > > environments (or, less tactfully, usable and unusable machines).. > > Would it be possible to discuss this here? . . . > what properties of general purpose > environments make them "unusable" for scientific/engineering computing? > -- > Shyy-Anzr: J. Eric Roskos How do you measure an OS and make it distinct from the language, the compiler, and the algorithm used? Where does fine line get drawn between a program and its translator unless you have an different compiler implementation to compare? >Message-ID: <3517@dartvax.UUCP> >> A problem is, certainly, how we measure things. >It might be interesting to define some fairly simple standard operations >and ask how long it takes to perform the operations. Typical standard Standards sends a bit of a chill up my back. It's a bit early to standardize. The following are shortened, but much like the above: >add -- takes two words (at least 32 bits) from memory, adds them together, >index -- picks up an array offset from memory, performs bounds checking >on the offset (we don't all write in C), >ptr_load -- (P->Record.Field) >array_loop -- load each element of an array into a register. > >These >simple operations would be a better measure than even simpler instructions >because each operation does something "useful". These operations can >also have advantages over high-level language benchmarks because they are >not dependent on the quality of a compiler. > >The qualities that I am aiming for here are primarily usefulness and >simplicity. > >-- Chuck >chuck@dartvax Dependence and independence seems to be a common theme. How dependent are most tests? Another problem is one of decomposition and parallelism. This will be especially important in future architctures. Are two operations performed sequentially equivalent to two operations performed in parallel. I think the answer is YES AND NO. We have a situation analogous to the Brooks Mythical Man-Month or you can have 9 women working 9 months (81 women-months) for 9 babies, but you can't get 9 women work 1 month for 1 baby. Another problem, more down to earth is the clock on a given system. Crays have a beautiful system clock. I cannot say the same for the Cyber. One of my problems is to just understand the behavior of different systems clocks. Needless to say 1/50-1/100th second don't cut it. Too much can happen during a tick. Repeating things for future division with a tick, leaves to much to compilers and OSes. I'd love to get my hands on a VAX with a 1 microsecond clock. > jww@SDCSVAX.ARPA > Another side issue that certain problems benchmark certain ways. > For example, in supporting a SIMSCRIPT II.5 discrete-event simulation, > we find that the best predictor of user performance is double-precision > ("single" on your Cray, george) floating point speed. There are a > lot of floating point comparisons on the event chain, plus the heavy > use of psuedo-random gamma functions, etc. requires F.P. multiplies and > divides. How many people really use gamma functions? [sorry, don't answer that] A local comment on this. One of our users gave a talk the other day. He placed a single statement of FORTRAN on the screen. The problem is a fluid dynamics problem and noteability this statement had 18 FP divisions on 3-D arrays [user wanted to point that out: the Cray 1/X division is relatively inefficient]. This says nothing of the +s and -s for the array indices of the 30-40 variables, the FP +s, -s, and *s, or the tremendous storage requirements. He user liked to point out that the CFT compiler, as much as we complain, reduced the 18 divides to 7. The Cray's real power is what it does on the indicies! > For compilation, however, integer performance -- particularly simple moves > and single-level indirect addressing -- is the best predictor of speed. ... > That's why machines with > strong integer scalar performance (e.g., Cray 1?) have it over those that > focus only on MFLOP's. What machine only focuses on MFLOPS? Have you run on it? Good architectures, as Brian Reid pointed out in net.micro.mac, are a good balance of tradeoffs. > Benchmarks typically are several hundred lines, with limited complexity > and usually small data cases. If you want to test typical throughput, > you need a typical program--even . . . 200,000 > lines of source. This also assures that if the system was "tuned", > it was probably a very limited sort of tuning that any owner of such > program would try anyway. Do we in principal really need the 200K line program? Why can't we come up with adequate smaller programs to give us an idea how the the 200K line program works? In other words why does the US need a Missouri [a show me state]? Can't we just take it for granted the sun is 93 million miles rather than remeasure it, some long as we know what a mile is? Our benchmarks tend to be too kind. We need benchmarks, I think which deliberately `break machines' along the line of these validation suites which check compiler limits, and so forth. On MWFs I tend to think that we can separate the OS, the compiler, the language from the machine. On TTS and I think is not possible. Today's Sunday, so I don't care. > > > It's my belief that this market requires > > "general purpose architectures" with "general purpose (usable) > > environments" > > george spix gas@lanl > > There have been no shortages of proposed architectures. There > haven't been as many "usable architectures," [true] > > A clever user will take "Program A" and put it up on machines X,Y,Z, > spending less than a week on each test. Which ever machine runs it > fastest, wins. From the user's standpoint, that's much better > than listening to MIP's, MFLOP's, or other mumbo-jumbo. > > Joel West CACI, Inc. - Federal (c/o UC San Diego) The tension here is the desire to make general portable, useable programs and to take advantage of machine performance features. I sometimes wonder if we will really have a Cray-on-a-desk and then it passes ;-). Few consider what a Cray is: word-oriented, big memory, vector registers, underdeveloped software [oops, sorry Bence and George]. Lastly, I wish to thank, LLNL, Cray, and Convex for time on some of their machines. I tried cutting this down more, I will try better next time. Sorry for rambo-ing, :-), I am still working on these ideas. Some of my existing prototype tests look at memory contention, vector instruction sets, compiler tricks and limitations. --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene emiya@ames-vmsb
jer@peora.UUCP (J. Eric Roskos) (09/04/85)
Eugene Miya writes: > One problem with computer science is that we have poor empirical and > laboratory skills and tools. I wrote, about an earlier topic: >>This statement isn't true. For example, back when I was a graduate-student >>researcher in computer architectures, we used the Whetstones to test our >>vertical-migration software. Eugene Miya replies: > I would Still place you in that category. Another problem with computer science, at least as it is often practiced in the technical newsgroups, is a failing of the scientific method in other areas; we resort to ad hominems to justify our positions, and make contradictory statements without explaining why. If the only people who use Whetstones are the manufacturers of the computing machines, and if I am "still" a graduate student researcher because I use them, isn't that contradictory? Actually I cancelled the message on benchmarks because I decided it didn't say much. Unfortunately, the "cancel" command apparently often doesn't work. And also, actually I mostly agree with you (except the ad hominem parts, of course). But to answer the questions: > How do you convince of compiler independent? That's very hard. If you are doing the evaluations yourself, you write the algorithm in a high-level-language, then hand-compile it the way you think a good compiler would compile it. Thereby you at least convince yourself. This is better than being unconvinced by "black box"-type benchmarks, though I agree that it is far from ideal, and far from what would be acceptable in many scientific communities for final results. > What about compiler dependent? What tests which determine compiler > characteristics? Limitations? It's been my experience that simply looking at the code generated by a compiler often tells a lot about the quality of the compiler. Of course, this breaks down when you get to the point of comparing several good optimizing compilers with one another; but most of the compilers out there today, especially for the microcomputers, are not that good, yet; their characteristics and limitations are readily visible in the code they generate. > OS independent, too. Just don't make any OS calls. I don't think many benchmarks do. If you are benchmarking I/O, well, unless you are going to do the I/O yourself, then probably the OS performance IS worth benchmarking. > A problem with uniprocessor architectures, computers, and operating > systems is that all are constructed in such as way as to make > measurement difficult. Taking a measurement affects the > thing being observed. R. I. Winner, my committee chairman, used to call this phenomenon (but as it related to debugging) "Heisenbugs". (I'm not sure if he coined the term, or not.) I don't think they are constructed *to* make measurement difficult. Rather (a) the economics of putting the instrumentation on to monitor the performance from the outside tends to discourage that in the marketplace, and (b) there are always things you just can't measure from inside the system. It would be nice to have performance measuring support that worked from outside (e.g., small processors dedicated to watching the cache, peripherals, etc. without disturbing them), but when it came to making the machine cost-effective to market, those would be one of the first things to be eliminated, I'm afraid. -- Shyy-Anzr: J. Eric Roskos UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer US Mail: MS 795; Perkin-Elmer SDC; 2486 Sand Lake Road, Orlando, FL 32809-7642 Uryc! Gurl'er cnvagvat zl bssvpr oyhr! Jung n qrcerffvat pbybe...
laura@l5.uucp (Laura Creighton) (09/06/85)
> Ah, guys there is more than one way to parse: >> One problem with computer science is that we have poor empirical and >> laboratory skills and tools. [Eugene Miya] One of them is ``you guys are all slugs for not using proper laboratory skills and tools, and if your professors didn't teach you them then they are all slugs too'' which appears to be the way some people have taken this. This is not what I read, though. What I got was ``there is an inadequate set of laboratory skills and tools designed for computer science, and so we make do, which isn't optimal''. The problem with having a real good hammer is that you think that all the world is a nail [anybody know the source of this quote?]. Is a Whetstone a particularily good tool to evaluate vertical migration software? Is it a particularilty good tool to measure the performace of vnews? Take vnews as a particular example. You want to give it the best user interface possible for both experienced and novice users, some of which have access to mice or other pointing devices, but most of which do not. What do you have? A research problem. While a lot more is known about user interfaces now than before, we still don't know enough. Okay, what basic underlying data structure should you use to represent all the news? Another research problem. What language should we write it in? Another research problem. Can I write a prototype first? Ooops, now we have a political problem. Management says yes and then ships the prototype. You never get to do a rewrite and you have to support the prototype. Pretty soon you give up prototyping... Which algorithms are best suited to this task? What is the best way to think about programming in order to come up with the best programs? How can I tell that algorithm A is going to be better than algorithm B without building a prototype? And, once the program is written -- how to tell which parts need a complete rewrite? I think that these are all hard questions which everybody faces and mostly solves by experience and personal opinion. I think that we are doing better in discovering what makes a programmer more productive, but a lot of these things (say UNIX and bitmapped workstations) are simply not available to most programmers. -- Laura Creighton (note new address!) sun!l5!laura (that is ell-five, not fifteen) l5!laura@lll-crg.arpa
zben@umd5.UUCP (09/07/85)
In article <72@l5.uucp> laura@l5.UUCP (Laura Creighton) writes: >Can I write a prototype first? Ooops, now we have a political problem. >Management says yes and then ships the prototype. You never get to do a >rewrite and you have to support the prototype. Pretty soon you give up >prototyping... This is one of the advantages of working for a government agency and not for a private firm. A lot less pressure for results yesterday. I suspect this has a *LOT* to do with the finest software coming from research labs (BTL) and from universities (BSD?). >Which algorithms are best suited to this task? What is the best way to >think about programming in order to come up with the best programs? How >can I tell that algorithm A is going to be better than algorithm B without >building a prototype? You can't. Have you heard of the 'write one to throw away' philosophy? If you have the time to do it, this is the best way. Here a year and a half into my electronic mail program and I am rewriting the first routines that I originally wrote. And the new ones don't look a *thing* like the old ones... >And, once the program is written -- how to tell which parts need a >complete rewrite? That's easy. All of them. :-) >I think that these are all hard questions which everybody faces and mostly >solves by experience and personal opinion. I think that we are doing better >in discovering what makes a programmer more productive, but a lot of >these things (say UNIX and bitmapped workstations) are simply not available >to most programmers. Yeah, well, many of us seem to do fine on ADM3-clones talking to mainframes. I'm not sure sexy toys have anything to do with good programming. >Laura Creighton (note new address!) >sun!l5!laura (that is ell-five, not fifteen) >l5!laura@lll-crg.arpa Nice to hear from you again! Keep pluggin' away there! -- Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA
joel@peora.UUCP (Joel Upchurch) (09/09/85)
>>Can I write a prototype first? Ooops, now we have a political problem. >>Management says yes and then ships the prototype. You never get to do a >>rewrite and you have to support the prototype. Pretty soon you give up >>prototyping... > >This is one of the advantages of working for a government agency and not for >a private firm. A lot less pressure for results yesterday. I suspect this >has a *LOT* to do with the finest software coming from research labs (BTL) >and from universities (BSD?). >Ben Cranston ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben zben@umd2.ARPA I don't think you expressed yourself quite correctly on this one. In the first place one of your examples (BTL) isn't a government agency. In the second place, I've worked for the federal government and there are plenty of scheduling pressures. Of course, most of the software development in the federal government is for internal consumption, but I don't think that the process is all that different from internal software development at a large corporation. The distinction you seem to be trying to make is between organization with research objectives and those with purely development objectives. But even there, it seems to me that schedules are the rule rather the exception. Students have to produce their projects on schedule, professors have publishing deadlines and grant renewals to worry about. A scientist doesn't know what the results of his experiments will be, but he usually has a schedule and a budget for completing them. I think it is more useful to distinguish between well designed and well mananged schedules and one that doesn't allow enough time to do the work or to do meaningful tracking of the project. But this is enough, if you have read Fred Brooks 'The Mythical Man Month' you've already read all about this, and if you haven't, then you should do so immediately. Joel Upchurch
rb@ccivax.UUCP (rex ballard) (09/17/85)
> > > >I think you need to make your performance measurements in such a way that > >you get a set of distinct numbers which can be used analytically to determine > >performance for a given program if you know certain properties of the > >program. For example: > > > >1) The rate of execution of each member of the set of arithmetic operations > >provided by the machine's instruction set, ... > >..., with cache disabled. > > > >2) The rate of execution of 1-word memory-to-memory moves, with cache > >disabled. > > > >3) The rate of execution of a tight loop ...register-to-register > >moves, with cache disabled. > > > >4) The rate of execution of a tight loop ... , with cache enabled. > > > >5) The rate of execution of a tight loop performing (same word size as #3 > >and #4 above) memory-to-memory moves that produce all cache "hits", with > >cache enabled. Note that this gives you two properties of your cache: your > >speedup for operand fetch and store resulting from caching, and any > >performance penalties resulting from a write-through vs. write-back cache. > > > >6) Specifications such as the number of registers available to the user, > >the size of the cache, etc. > > > >Well, you get the idea, anyway... personally I tend to feel that statistical > >performance measurements are not nearly as useful as analytical ones; I > >would rather see a list of fairly distinct performance properties of a pro- > >cessor anytime, since I think you can do more with them in terms of > >saying how the machine will perform for a given application that way. I would like to add a few more tests in this vein. 7) The time required to do a "structured call" (ie: save entire machine state; transfer control to a "minimal subroutine" like "return(arg1+arg2+arg3)" with all arguments on the stack; place the result in single register; and return to caller. The reason for a test like this comes from a study done by M. McGowan. In a study of several million lines of code, the number of revisions of a given source module increased EXPONENTIALLY relative to it's size. Reguardless of the language, the number of revisions increased an average of (1/25)**2. The 25 was the number of lines displayable on the screen at one time. The theoretical ideal ratio between implementing a 'Macro Expansion' and a 'structured call' should theoretically be 0; In convential benchmarks, a "call optimized" computer may show very little superiority. In general purpose applications where "modular software design" is a necessity, the relative performance may double. Unfortunately, such a computer would also have this advantage in general benchmark tests. 8) The time required to do a "context switch" (ie: save entire machine state, get new context, save state, return to old context.) This can be a good indicator of interrupt responsiveness, suitability for multitasking, and "event driven" situations. 9) The time required to save "equivalent states"; a machine with 8 registers may have less to do in "state save" than a machine with 32, but can "hide" the number of "real" state values required for a context switch for benchmarking purposes. (these opinions were my own, but I'm giving them up for adoption)
boston@celerity.UUCP (Boston Office) (09/24/85)
In article <734@umd5.UUCP> zben@umd5.UUCP (Ben Cranston) writes: >In article <72@l5.uucp> laura@l5.UUCP (Laura Creighton) writes: > >>Can I write a prototype first? Ooops, now we have a political problem. >>Management says yes and then ships the prototype. You never get to do a >>rewrite and you have to support the prototype. Pretty soon you give up >>prototyping... > >This is one of the advantages of working for a government agency and not for >a private firm. A lot less pressure for results yesterday. ...or at all, from the way our taxes are being squandered. >I suspect this >has a *LOT* to do with the finest software coming from research labs (BTL) >and from universities (BSD?). This is a delusion.
friesen@psivax.UUCP (Stanley Friesen) (10/02/85)
In article <256@ccivax.UUCP> rb@ccivax.UUCP (rex ballard) writes: > >The theoretical ideal ratio between implementing a 'Macro Expansion' and a >'structured call' should theoretically be 0; > I believe you mean *1* not 0. Zero would mean that either a 'Macro Expansion' has *no* cost or a 'Structured Call' has an *infinite* cost! A *ratio* is a *division*, thus no cost difference would be a/a = 1. -- Sarima (Stanley Friesen) UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen ARPA: ttidca!psivax!friesen@rand-unix.arpa