johnl@ima.UUCP (04/22/87)
> ... Another approach that > I have seen is to run the source program through a preprocessor which adds > tracing code into the source, then compile the resulting program with the > regular compiler. Yet another approach, used (although not for this purpose, I suspect) by the MIPSCo people, is to run the object program through a postprocessor that inserts tracing code. For example, before every "add float" instruction, insert an instruction that bumps an "add float" counter. (I suggest "before" rather than "after" because then the condition codes don't get messed up, or at least not as badly -- beware optimizing compilers rearranging code!) If you did it at the assembler level rather than the binary level, this might not be too hard. That might be awkward if libraries are involved, though. This gets you fast, accurate counts without compiler cooperation. Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (04/24/87)
> Yet another approach, used (although not for this purpose, I suspect) by > the MIPSCo people, is to run the object program through a postprocessor > that inserts tracing code. For example, before every "add float" > instruction, insert an instruction that bumps an "add float" counter. I'll elaborate a bit: As Henry said, our postprocessor "pixie" does indeed work on the already-linked program (Unix a.out format) rather than on assembly code, so that libraries get included automatically, which is fortunate, because neglecting libraries sometimes makes a big difference (whetstone, for example). "pixie" actually inserts the counting-code at the beginning of each basic block, rather than before particular instructions of interest. It identifies basic blocks the hard way, by passing through the program once to collect all of the branch-, jump-, and call-targets. And it remaps the registers onto a smaller set, spilling into memory as necessary, so that certain registers are globally available for use by the counting-code. Dynamic branches (e.g. case statements) get handled laboriously at execution time. The extra difficulty of identifying basic blocks pays off by reducing the time spent in the counting-code (often a single basic block contains multiple instances of whatever we want to count), and by making the system more versatile. A second postprocessor examines the counts and basic-block boundaries provided by the first, plus the original linked program, and derives statistics on whatever aspect of the program we're interested in (cycles, ratio of loads to stores, ratio of byte operations to word operations, etc). We can measure new things by changing the second postprocessor, without changing "pixie". It's a lovely tool (written by a fellow named Earl Killian). If you just want to count flops, however, then editing the assembly file to preface each floating-point op with a call to a counting subroutine will solve your problem. ...decwrl!mips!sjc Steve Correll -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
johnl@ima.UUCP (04/26/87)
Cc:
In article <1299@ames.UUCP> you write:
[How can I count the number of floating point operations in a program in
the absence of a helpful compiler?]
You can also use a software monitor that does this. Frequently,
the debug package for a machine will already do this. If not
any machine level debugger with a step 1 instruction capability
is a good starting point.
On the Honeywell DPS-8, 88, 99 machines, this is in the debugger.
Many mini and micro computers (68000 and PDP-11) have a hardware
"trace mode" that makes writing such a program staight forward.
The only problem with this, is that the program will be slow when
running. However, usually when one is collecting statistics about
such things the factor of 20-100 speed loss is not a problem.
[A similar possibility is to write your own trace package that interprets
the machine code. Interpreting the machine's own code is fairly easy
since for most instructions you can just stick them in memory and execute
them; you only need to fake things like branches. Then you can collect
any statistics you want. It's an order of magnitude slower than running
the program native, but you get 100% accurate numbers. -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
eugene@ames-pioneer.arpa (Eugene Miya N.) (05/07/87)
In article <552@ima.UUCP> Henry Spencer writes: > >[You can run your assembler through a preprocessor that puts counting code > before each float operations.] >This gets you fast, accurate counts without compiler cooperation. > > Henry Spencer @ U of Toronto Zoology Bullshit. It's very intrusive (typically around 10%). This is enough to totally wreck measurement on all and worse: synchronization on parallel codes. If you want a reference to identifying the problem I will mail it, otherwise forget it. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center eugene@ames-aurora.ARPA {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
eugene@ames-pioneer.arpa (Eugene Miya N.) (05/07/87)
[In reference to machines with hardware or software assistance for
instruction tracing and counting... ]
Sorry, I am just catching up with last changed groups.
I have never run on a Honeywell machine. The best machine I have seen
(not perfect, but pretty nice) is a Cray X-MP with its Hardware
Performance Monitor. We have proposed a session at the Cray User Group
meeting in Tokyo, Japan, Fall 1988 on this topic for those interested
(like LANL). The HPM is like an atomic clock (well not quite, but it
hums nicely) to an old wind-up clock (like a VAX [lots of skew]).
%A John L. Larson
%T Multitasking on the CRAY X-MP-2 Multiprocessor
%J Computer
%I IEEE
%V 17
%N 7
%D July 1984
%P 62-69
%K Hardware-software interface: effect on performance
%X A summary of the paper on Multitasking FORTRAN. It uses subroutine
calls and arrays to START, WAIT, LOCK processes and signal (POST) EVENTS.
The paper does not mention deadlock or blocking.
>From the Rock of Ages Home for Retired Hackers:
--eugene miya
NASA Ames Research Center
eugene@ames-aurora.ARPA
"You trust the `reply' command with all those different mailers out there?"
"Send mail, avoid follow-ups. If enough, I'll summarize."
{hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers. Meta-mail to ima!compilers-request