[net.arch] Scientific Computing and mips <sorry, obscenely long>

eugene@ames.UUCP (Eugene Miya) (09/02/85)

<29898@lanl.ARPA> <1517@peora.UUCP> <30105@lanl.ARPA> <1062@sdcsvax.UUCP>

Thank you for your patience.  I have mulled some of this over, and
I have some incomplete thoughts if you bear with me.  I have some
preliminary research, some of it graphical on the Cray-1/S, X-MP, 2, Cyber
205, Convex C-1, and ELXSI, and if I could find a VAX or a 68000 with
a good clock, I would do them, too.

I think science proceeds of three phases:
	Detection [note: Boolean], Lots of qualitative information [hunchs]
	Identification [note: maybe enumerations,classes,sets,types]
	Detailed Analysis [more complex mathematics: stat, etc.]
I would like to summarize some of the issues posted so far.  I mentioned
"atomicity" in the first posting.  Atomic for two reasons: small and
low-level and indivisibility. Scalability seems to be another [too many
ilities] concern.  A variation of this is termed "realistic" by some.
[My comments indented from here.]

> Message-ID: <29898@lanl.ARPA>
> Another metric is a sampling of large, mostly unmodifiable, commercial
> codes.  My preference is MSC/Nastran (.5e6+lines of finite element code).
> It runs on a surprising number of machines, and is a rigorous test of not
> only performance (cpu and io) but the scientific/engineering environment
> available to the average user.
> george spix    gas@lanl
	[Portability, generality]
	I am surprised no one objected.  The problem with this is the behavior
	of large programs like this are not well understood.  Sure you can
	take a time, you can measure core usage, and so on, but it
	doesn't generalize, and the code still has problems, bugs, desired
	new features, otherwise MSC would not still be in business
	[I'm stretching this one, I know.]

	This brings up the issue of analytic method: [mentioned below]
		separability of components.
	We have few measures of I/O and if we add two things to
	together, we get a sum.  One problem with computer science
	is that we have poor empirical and laboratory skills and tools.
	[I am working on a series of TRs on experiment design in CS,
	but my management wants me to work on their stuff.]
	On one hand people aren't mathematical enough in their analysis,
	yet we tend to lack a lot of `controls' when taking measurements
	[too mathematical].

	There seem to be gaps between those who do
	queueing models, those who insist on measuring the near
	atomic qualities of systems, and letters like the above
	insisting on 'realism.'  Do these issues scale?

>Message-ID: <1512@peora.UUCP>
>
>> Whetstones were mentioned in another letter, but the only people who use
>> these are computer manufacturers.
>This statement isn't true.  For example, back when I was a graduate-student
>researcher in computer architectures, we used the Whetstones to test our
>vertical-migration software.

	I would Still place you in that category.  Whetstones are
	rather arbitrary and I did some work to locate the original
	book (not a paper) on ALGOL.  Our problems have a greater
	degree of heterogeneity as George points out above.

>> What qualities do our performance metrics need to have?
>
>I think you need to make your performance measurements in such a way that
>you get a set of distinct numbers which can be used analytically to determine
>performance for a given program if you know certain properties of the
>program.  For example:
>
>1) The rate of execution of each member of the set of arithmetic operations
>provided by the machine's instruction set, ...
>..., with cache disabled.
>
>2) The rate of execution of 1-word memory-to-memory moves, with cache
>disabled.
>
>3) The rate of execution of a tight loop ...register-to-register
>moves, with cache disabled.
>
>4) The rate of execution of a tight loop ... , with cache enabled.
>
>5) The rate of execution of a tight loop performing (same word size as #3
>and #4 above) memory-to-memory moves that produce all cache "hits", with
>cache enabled.  Note that this gives you two properties of your cache: your
>speedup for operand fetch and store resulting from caching, and any
>performance penalties resulting from a write-through vs. write-back cache.
>
>6) Specifications such as the number of registers available to the user,
>the size of the cache, etc.
>
>Well, you get the idea, anyway... personally I tend to feel that statistical
>performance measurements are not nearly as useful as analytical ones; I
>would rather see a list of fairly distinct performance properties of a pro-
>cessor anytime, since I think you can do more with them in terms of
>saying how the machine will perform for a given application that way.

	I agree with you here, but why do I say that?

>I separated out the various forms of caching (operations in registers, and
>use of a cache between the CPU and the primary memory) because so many
>people "fudge" their results that way without giving any information from
>which you can determine real performance.  The above list is just meant to
>suggest "qualities" rather than being an exhaustive list; i.e., that the
>performance metrics should reveal (rather than hide) the set of factors
>that actually influence performance. [Unfortunately, this would never suit
>most marketing organizations nor customers, since they want an all-
>encompassing number.]
	Sad but true.  The all-encompassing number is a problem.
	I would like to do a bit in functional analysis: principally
	vector valued measures, but Alan Smith at Berkeley suggested
	staying with measures as simple as possible. [I agree in principle.]
	But I want to consider them [Aan suggest factor analysis at most].
>The metrics should also be compiler-independent.
>Shyy-Anzr:  J. Eric Roskos

	How do you convince of compiler independent?
	What about compiler dependent?  What tests which determine compiler
	characteristics? Limitations?
	OS independent, too.
	A problem with uniprocessor architectures, computers, and operating
	systems is that all are constructed in such as way as to make
	measurement difficult.  Taking a measurement affects the
	thing being observed.  High-level measurement concepts are lacking
	instead we measure oscilloscope pulses and say, so many x's were
	transfered only because we know how many x's there were to begin
	with.

> Message-ID: <1517@peora.UUCP>
> > It helps draw the line between special purpose and general purpose
> > environments (or, less tactfully, usable and unusable machines)..
> 
> Would it be possible to discuss this here? . . .
> what properties of general purpose
> environments make them "unusable" for scientific/engineering computing?
> -- 
> Shyy-Anzr:  J. Eric Roskos

	How do you measure an OS and make it distinct from the language,
	the compiler, and the algorithm used?  Where does fine line
	get drawn between a program and its translator unless you
	have an different compiler implementation to compare?

>Message-ID: <3517@dartvax.UUCP>
>> A problem is, certainly, how we measure things.
>It might be interesting to define some fairly simple standard operations
>and ask how long it takes to perform the operations.  Typical standard

	Standards sends a bit of a chill up my back.  It's a bit
	early to standardize.
	The following are shortened, but much like the above:
>add -- takes two words (at least 32 bits) from memory, adds them together,
>index -- picks up an array offset from memory, performs bounds checking
>on the offset (we don't all write in C),
>ptr_load --  (P->Record.Field)
>array_loop -- load each element of an array into a register.
>
>These
>simple operations would be a better measure than even simpler instructions
>because each operation does something "useful".  These operations can
>also have advantages over high-level language benchmarks because they are
>not dependent on the quality of a compiler.
>
>The qualities that I am aiming for here are primarily usefulness and 
>simplicity.
>
>-- Chuck
>chuck@dartvax

	Dependence and independence seems to be a common theme.
	How dependent are most tests?

	Another problem is one of decomposition and parallelism.
	This will be especially important in future architctures.
	Are two operations performed sequentially equivalent to two
	operations performed in parallel.  I think the answer is YES AND NO.
	We have a situation analogous to the Brooks Mythical Man-Month or
	you can have 9 women working 9 months (81 women-months) for 9 babies,
	but you can't get 9 women work 1 month for 1 baby.

	Another problem, more down to earth is the clock on a given system.
	Crays have a beautiful system clock.  I cannot say the same for
	the Cyber.  One of my problems is to just understand the behavior
	of different systems clocks.  Needless to say 1/50-1/100th
	second don't cut it.  Too much can happen during a tick.
	Repeating things for future division with a tick, leaves to
	much to compilers and OSes.  I'd love to get my hands on a VAX
	with a 1 microsecond clock.  

> 	jww@SDCSVAX.ARPA
> Another side issue that certain problems benchmark certain ways.
> For example, in supporting a SIMSCRIPT II.5 discrete-event simulation,
> we find that the best predictor of user performance is double-precision
> ("single" on your Cray, george) floating point speed.  There are a
> lot of floating point comparisons on the event chain, plus the heavy
> use of psuedo-random gamma functions, etc. requires F.P. multiplies and
> divides.
	How many people really use gamma functions?  [sorry, don't answer that]
	A local comment on this.  One of our users gave a talk the other
	day.  He placed a single statement of FORTRAN on the screen.
	The problem is a fluid dynamics problem and noteability this
	statement had 18 FP divisions on 3-D arrays [user wanted to point that
	out: the Cray 1/X division is relatively inefficient].  This says
	nothing of the +s and -s for the array indices of the 30-40
	variables, the FP +s, -s, and *s, or the tremendous storage
	requirements.  He user liked to point out that the CFT compiler,
	as much as we complain, reduced the 18 divides to 7.
	The Cray's real power is what it does on the indicies!

> For compilation, however, integer performance -- particularly simple moves
> and single-level indirect addressing -- is the best predictor of speed.
...
> That's why machines with 
> strong integer scalar performance (e.g., Cray 1?) have it over those that 
> focus only on MFLOP's.

	What machine only focuses on MFLOPS?  Have you run on it?
	Good architectures, as Brian Reid pointed out in net.micro.mac,
	are a good balance of tradeoffs.

> Benchmarks typically are several hundred lines, with limited complexity
> and usually small data cases.  If you want to test typical throughput,
> you need a typical program--even . . . 200,000
> lines of source.  This also assures that if the system was "tuned",
> it was probably a very limited sort of tuning that any owner of such
> program would try anyway.
	Do we in principal really need the 200K line program?
	Why can't we come up with adequate smaller programs to give us an
	idea how the the 200K line program works?  In other words
	why does the US need a Missouri [a show me state]?  Can't
	we just take it for granted the sun is 93 million miles
	rather than remeasure it, some long as we know what a mile is?

	Our benchmarks tend to be too kind.  We need benchmarks, I think
	which deliberately `break machines' along the line of these
	validation suites which check compiler limits, and so forth.
	On MWFs I tend to think that we can separate the OS, the compiler,
	the language from the machine.  On TTS and I think is not possible.
	Today's Sunday, so I don't care.
> 
> > It's my belief that this market requires
> > "general purpose architectures" with "general purpose (usable)
> > environments"
> > 	george spix       gas@lanl
> 
> There have been no shortages of proposed architectures.  There
> haven't been as many "usable architectures," [true]
> 
> A clever user will take "Program A" and put it up on machines X,Y,Z,
> spending less than a week on each test.  Which ever machine runs it
> fastest, wins.  From the user's standpoint, that's much better
> than listening to MIP's, MFLOP's, or other mumbo-jumbo.
> 
> 	Joel West	CACI, Inc. - Federal (c/o UC San Diego)

	The tension here is the desire to make general portable, useable
	programs and to take advantage of machine performance features.

	I sometimes wonder if we will really have a Cray-on-a-desk
	and then it passes ;-).  Few consider what a Cray is:
	word-oriented, big memory, vector registers, underdeveloped
	software [oops, sorry Bence and George].

Lastly, I wish to thank, LLNL, Cray, and Convex for time on some of
their machines.  I tried cutting this down more, I will try better next
time. Sorry for rambo-ing, :-), I am still working on these
ideas.  Some of my existing prototype tests look at memory contention,
vector instruction sets, compiler tricks and limitations.

--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene
  emiya@ames-vmsb

jer@peora.UUCP (J. Eric Roskos) (09/04/85)

Eugene Miya writes:

> One problem with computer science is that we have poor empirical and
> laboratory skills and tools.

I wrote, about an earlier topic:

>>This statement isn't true.  For example, back when I was a graduate-student
>>researcher in computer architectures, we used the Whetstones to test our
>>vertical-migration software.

Eugene Miya replies:

> I would Still place you in that category.

Another problem with computer science, at least as it is often practiced in
the technical newsgroups, is a failing of the scientific method in other
areas; we resort to ad hominems to justify our positions, and make
contradictory statements without explaining why.  If the only people who
use Whetstones are the manufacturers of the computing machines, and if I
am "still" a graduate student researcher because I use them, isn't that
contradictory?

Actually I cancelled the message on benchmarks because I decided it didn't
say much.  Unfortunately, the "cancel" command apparently often doesn't work.

And also, actually I mostly agree with you (except the ad hominem parts,
of course).  But to answer the questions:

>        How do you convince of compiler independent?

That's very hard.  If you are doing the evaluations yourself, you write the
algorithm in a high-level-language, then hand-compile it the way you think
a good compiler would compile it.  Thereby you at least convince yourself.
This is better than being unconvinced by "black box"-type benchmarks, though
I agree that it is far from ideal, and far from what would be acceptable
in many scientific communities for final results.

>       What about compiler dependent?  What tests which determine compiler
>       characteristics? Limitations?

It's been my experience that simply looking at the code generated by a
compiler often tells a lot about the quality of the compiler.  Of course,
this breaks down when you get to the point of comparing several good
optimizing compilers with one another; but most of the compilers out there
today, especially for the microcomputers, are not that good, yet; their
characteristics and limitations are readily visible in the code they
generate.

>       OS independent, too.

Just don't make any OS calls.  I don't think many benchmarks do.  If you
are benchmarking I/O, well, unless you are going to do the I/O yourself,
then probably the OS performance IS worth benchmarking.

>       A problem with uniprocessor architectures, computers, and operating
>       systems is that all are constructed in such as way as to make
>       measurement difficult.  Taking a measurement affects the
>       thing being observed.

R.  I.  Winner, my committee chairman, used to call this phenomenon (but as
it related to debugging) "Heisenbugs". (I'm not sure if he coined the term,
or not.) I don't think they are constructed *to* make measurement
difficult.  Rather (a) the economics of putting the instrumentation on to
monitor the performance from the outside tends to discourage that in the
marketplace, and (b) there are always things you just can't measure from
inside the system.  It would be nice to have performance measuring support
that worked from outside (e.g., small processors dedicated to watching the
cache, peripherals, etc. without disturbing them), but when it came to
making the machine cost-effective to market, those would be one of the
first things to be eliminated, I'm afraid.
-- 
Shyy-Anzr:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	Uryc!  Gurl'er cnvagvat zl bssvpr oyhr!  Jung n qrcerffvat pbybe...

laura@l5.uucp (Laura Creighton) (09/06/85)

>
Ah, guys there is more than one way to parse:

	>> One problem with computer science is that we have poor empirical and
	>> laboratory skills and tools. [Eugene Miya]

One of them is ``you guys are all slugs for not using proper laboratory skills
and tools, and if your professors didn't teach you them then they are all
slugs too'' which appears to be the way some people have taken this. This is
not what I read, though. What I got was ``there is an inadequate set of
laboratory skills and tools designed for computer science, and so we make do,
which isn't optimal''.  The problem with having a real good hammer is
that you think that all the world is a nail [anybody know the source of
this quote?]. Is a Whetstone a particularily good tool to evaluate
vertical migration software? Is it a particularilty good tool to measure
the performace of vnews? 

Take vnews as a particular example. You want to give it the best user
interface possible for both experienced and novice users, some of which
have access to mice or other pointing devices, but most of which do not.
What do you have? A research problem. While a lot more is known about
user interfaces now than before, we still don't know enough.

Okay, what basic underlying data structure should you use to represent all
the news? Another research problem.

What language should we write it in? Another research problem.

Can I write a prototype first? Ooops, now we have a political problem.
Management says yes and then ships the prototype. You never get to do a
rewrite and you have to support the prototype. Pretty soon you give up
prototyping...

Which algorithms are best suited to this task? What is the best way to
think about programming in order to come up with the best programs? How
can I tell that algorithm A is going to be better than algorithm B without
building a prototype?

And, once the program is written -- how to tell which parts need a 
complete rewrite? 

I think that these are all hard questions which everybody faces and mostly
solves by experience and personal opinion.  I think that we are doing better
in discovering what makes a programmer more productive, but a lot of
these things (say UNIX and bitmapped workstations) are simply not available
to most programmers.



-- 

Laura Creighton		(note new address!)
sun!l5!laura		(that is ell-five, not fifteen)
l5!laura@lll-crg.arpa

zben@umd5.UUCP (09/07/85)

In article <72@l5.uucp> laura@l5.UUCP (Laura Creighton) writes:

>Can I write a prototype first? Ooops, now we have a political problem.
>Management says yes and then ships the prototype. You never get to do a
>rewrite and you have to support the prototype. Pretty soon you give up
>prototyping...
 
This is one of the advantages of working for a government agency and not for
a private firm.  A lot less pressure for results yesterday.  I suspect this
has a *LOT* to do with the finest software coming from research labs (BTL)
and from universities (BSD?).

>Which algorithms are best suited to this task? What is the best way to
>think about programming in order to come up with the best programs? How
>can I tell that algorithm A is going to be better than algorithm B without
>building a prototype?
 
You can't.  Have you heard of the 'write one to throw away' philosophy?  If
you have the time to do it, this is the best way.  Here a year and a half 
into my electronic mail program and I am rewriting the first routines that I
originally wrote.  And the new ones don't look a *thing* like the old ones...

>And, once the program is written -- how to tell which parts need a 
>complete rewrite? 

That's easy.  All of them.   :-)

>I think that these are all hard questions which everybody faces and mostly
>solves by experience and personal opinion.  I think that we are doing better
>in discovering what makes a programmer more productive, but a lot of
>these things (say UNIX and bitmapped workstations) are simply not available
>to most programmers.

Yeah, well, many of us seem to do fine on ADM3-clones talking to mainframes.
I'm not sure sexy toys have anything to do with good programming.

>Laura Creighton		(note new address!)
>sun!l5!laura		(that is ell-five, not fifteen)
>l5!laura@lll-crg.arpa

Nice to hear from you again!  Keep pluggin' away there!
-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

joel@peora.UUCP (Joel Upchurch) (09/09/85)

>>Can I write a prototype first? Ooops, now we have a political problem.
>>Management says yes and then ships the prototype. You never get to do a
>>rewrite and you have to support the prototype. Pretty soon you give up
>>prototyping...
>
>This is one of the advantages of working for a government agency and not for
>a private firm.  A lot less pressure for results yesterday.  I suspect this
>has a *LOT* to do with the finest software coming from research labs (BTL)
>and from universities (BSD?).
>Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

        I don't think you expressed yourself quite correctly  on  this
        one.  In  the  first  place one of your examples (BTL) isn't a
        government agency.  In the second place, I've worked  for  the
        federal   government   and  there  are  plenty  of  scheduling
        pressures.  Of course, most of the software development in the
        federal  government  is  for internal consumption, but I don't
        think that the process is all  that  different  from  internal
        software development at a large corporation.

        The distinction you seem to  be  trying  to  make  is  between
        organization  with  research  objectives and those with purely
        development objectives.  But even there, it seems to  me  that
        schedules are the rule rather the exception.  Students have to
        produce their projects on schedule, professors have publishing
        deadlines  and  grant  renewals  to  worry about.  A scientist
        doesn't know what the results of his experiments will be,  but
        he usually has a schedule and a budget for completing them.

        I think it is more useful to distinguish between well designed
        and  well mananged schedules and one that doesn't allow enough
        time to do the work  or  to  do  meaningful  tracking  of  the
        project.  But  this  is  enough,  if you have read Fred Brooks
        'The Mythical Man Month' you've already read all  about  this,
        and if you haven't, then you should do so immediately.

						Joel Upchurch

rb@ccivax.UUCP (rex ballard) (09/17/85)

> >
> >I think you need to make your performance measurements in such a way that
> >you get a set of distinct numbers which can be used analytically to determine
> >performance for a given program if you know certain properties of the
> >program.  For example:
> >
> >1) The rate of execution of each member of the set of arithmetic operations
> >provided by the machine's instruction set, ...
> >..., with cache disabled.
> >
> >2) The rate of execution of 1-word memory-to-memory moves, with cache
> >disabled.
> >
> >3) The rate of execution of a tight loop ...register-to-register
> >moves, with cache disabled.
> >
> >4) The rate of execution of a tight loop ... , with cache enabled.
> >
> >5) The rate of execution of a tight loop performing (same word size as #3
> >and #4 above) memory-to-memory moves that produce all cache "hits", with
> >cache enabled.  Note that this gives you two properties of your cache: your
> >speedup for operand fetch and store resulting from caching, and any
> >performance penalties resulting from a write-through vs. write-back cache.
> >
> >6) Specifications such as the number of registers available to the user,
> >the size of the cache, etc.
> >
> >Well, you get the idea, anyway... personally I tend to feel that statistical
> >performance measurements are not nearly as useful as analytical ones; I
> >would rather see a list of fairly distinct performance properties of a pro-
> >cessor anytime, since I think you can do more with them in terms of
> >saying how the machine will perform for a given application that way.
 
I would like to add a few more tests in this vein.

7) The time required to do a "structured call"  (ie: save entire machine
state; transfer control to a "minimal subroutine" like "return(arg1+arg2+arg3)"
with all arguments on the stack; place the result in single register;
and return to caller.

The reason for a test like this comes from a study done by M. McGowan.
In a study of several million lines of code, the number of revisions of
a given source module increased EXPONENTIALLY relative to it's size.

Reguardless of the language, the number of revisions increased an
average of (1/25)**2.  The 25 was the number of lines displayable
on the screen at one time.

The theoretical ideal ratio between implementing a 'Macro Expansion' and a
'structured call' should theoretically be 0;

In convential benchmarks, a "call optimized" computer may show very little
superiority.  In general purpose applications where "modular software
design" is a necessity, the relative performance may double.

Unfortunately, such a computer would also have this advantage in general
benchmark tests.

8) The time required to do a "context switch" (ie: save entire machine
state, get new context, save state, return to old context.)

This can be a good indicator of interrupt responsiveness, suitability
for multitasking, and "event driven" situations.

9) The time required to save "equivalent states";

a machine with 8 registers may have less to do in "state save" than a
machine with 32, but can "hide" the number of "real" state values
required for a context switch for benchmarking purposes.

(these opinions were my own, but I'm giving them up for adoption)

boston@celerity.UUCP (Boston Office) (09/24/85)

In article <734@umd5.UUCP> zben@umd5.UUCP (Ben Cranston) writes:
>In article <72@l5.uucp> laura@l5.UUCP (Laura Creighton) writes:
>
>>Can I write a prototype first? Ooops, now we have a political problem.
>>Management says yes and then ships the prototype. You never get to do a
>>rewrite and you have to support the prototype. Pretty soon you give up
>>prototyping...
> 
>This is one of the advantages of working for a government agency and not for
>a private firm.  A lot less pressure for results yesterday. 

...or at all, from the way our taxes are being squandered.

>I suspect this
>has a *LOT* to do with the finest software coming from research labs (BTL)
>and from universities (BSD?).

This is a delusion.

friesen@psivax.UUCP (Stanley Friesen) (10/02/85)

In article <256@ccivax.UUCP> rb@ccivax.UUCP (rex ballard) writes:
>
>The theoretical ideal ratio between implementing a 'Macro Expansion' and a
>'structured call' should theoretically be 0;
>
	I believe you mean *1* not 0. Zero would mean that either a
'Macro Expansion' has *no* cost or a 'Structured Call' has an
*infinite* cost! A *ratio* is a *division*, thus no cost difference
would be a/a = 1.
-- 

				Sarima (Stanley Friesen)

UUCP: {ttidca|ihnp4|sdcrdcf|quad1|nrcvax|bellcore|logico}!psivax!friesen
ARPA: ttidca!psivax!friesen@rand-unix.arpa