[comp.compilers] Static analysis of code for "flop counting"?

johnl@ima.UUCP (04/22/87)

> ... Another approach that
> I have seen is to run the source program through a preprocessor which adds
> tracing code into the source, then compile the resulting program with the
> regular compiler.

Yet another approach, used (although not for this purpose, I suspect) by
the MIPSCo people, is to run the object program through a postprocessor
that inserts tracing code.  For example, before every "add float"
instruction, insert an instruction that bumps an "add float" counter.
(I suggest "before" rather than "after" because then the condition codes
don't get messed up, or at least not as badly -- beware optimizing
compilers rearranging code!)

If you did it at the assembler level rather than the binary level, this 
might not be too hard.  That might be awkward if libraries are involved,
though.

This gets you fast, accurate counts without compiler cooperation.

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnl@ima.UUCP (04/24/87)

> Yet another approach, used (although not for this purpose, I suspect) by
> the MIPSCo people, is to run the object program through a postprocessor
> that inserts tracing code.  For example, before every "add float"
> instruction, insert an instruction that bumps an "add float" counter.

I'll elaborate a bit:  As Henry said, our postprocessor "pixie" does
indeed work on the already-linked program (Unix a.out format) rather than
on assembly code, so that libraries get included automatically, which is
fortunate, because neglecting libraries sometimes makes a big difference
(whetstone, for example).

"pixie" actually inserts the counting-code at the beginning of each basic
block, rather than before particular instructions of interest.  It
identifies basic blocks the hard way, by passing through the program
once to collect all of the branch-, jump-, and call-targets.  And it
remaps the registers onto a smaller set, spilling into memory as
necessary, so that certain registers are globally available for use by
the counting-code.  Dynamic branches (e.g. case statements) get handled
laboriously at execution time.

The extra difficulty of identifying basic blocks pays off by reducing
the time spent in the counting-code (often a single basic block
contains multiple instances of whatever we want to count), and by
making the system more versatile. A second postprocessor examines the
counts and basic-block boundaries provided by the first, plus the
original linked program, and derives statistics on whatever aspect of
the program we're interested in (cycles, ratio of loads to stores,
ratio of byte operations to word operations, etc).  We can measure new
things by changing the second postprocessor, without changing "pixie".

It's a lovely tool (written by a fellow named Earl Killian). If you just
want to count flops, however, then editing the assembly file to preface
each floating-point op with a call to a counting subroutine will solve
your problem.

...decwrl!mips!sjc						Steve Correll
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnl@ima.UUCP (04/26/87)

Cc:

In article <1299@ames.UUCP> you write:
[How can I count the number of floating point operations in a program in
the absence of a helpful compiler?]

You can also use a software monitor that does this.  Frequently,
the debug package for a machine will already do this. If not
any machine level debugger with a step 1 instruction capability
is a good starting point.
   On the Honeywell DPS-8, 88, 99 machines, this is in the debugger.
Many mini and micro computers (68000 and PDP-11) have a hardware
"trace mode" that makes writing such a program staight forward.
The only problem with this, is that the program will be slow when
running.  However, usually when one is collecting statistics about
such things the factor of 20-100 speed loss is not a problem.

[A similar possibility is to write your own trace package that interprets
the machine code.  Interpreting the machine's own code is fairly easy
since for most instructions you can just stick them in memory and execute
them; you only need to fake things like branches.  Then you can collect
any statistics you want.  It's an order of magnitude slower than running
the program native, but you get 100% accurate numbers.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

eugene@ames-pioneer.arpa (Eugene Miya N.) (05/07/87)

In article <552@ima.UUCP> Henry Spencer writes:
>
>[You can run your assembler through a preprocessor that puts counting code
> before each float operations.]
>This gets you fast, accurate counts without compiler cooperation.
>
>				Henry Spencer @ U of Toronto Zoology

Bullshit.

It's very intrusive (typically around 10%).  This is enough to totally
wreck measurement on all and worse: synchronization on parallel codes.  If you
want a reference to identifying the problem I will mail it, otherwise
forget it.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

eugene@ames-pioneer.arpa (Eugene Miya N.) (05/07/87)

[In reference to machines with hardware or software assistance for
instruction tracing and counting... ]

Sorry, I am just catching up with last changed groups.
I have never run on a Honeywell machine.  The best machine I have seen
(not perfect, but pretty nice) is a Cray X-MP with its Hardware
Performance Monitor.   We have proposed a session at the Cray User Group
meeting in Tokyo, Japan, Fall 1988 on this topic for those interested
(like LANL).  The HPM is like an atomic clock (well not quite, but it
hums nicely) to an old wind-up clock (like a VAX [lots of skew]).

%A John L. Larson
%T Multitasking on the CRAY X-MP-2 Multiprocessor
%J Computer
%I IEEE
%V 17
%N 7
%D July 1984
%P 62-69
%K Hardware-software interface: effect on performance
%X A summary of the paper on Multitasking FORTRAN.  It uses subroutine
calls and arrays to START, WAIT, LOCK processes and signal (POST) EVENTS.
The paper does not mention deadlock or blocking.

>From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request