[comp.unix.aux] CPU Profiling in A/UX - Broken???

dcr0@GTE.COM (David Robbins) (08/21/90)
I recently had a need to profile the CPU time usage of a program running
on an A/UX system.  My experience was, shall we say, rather frustrating.

I did all the stuff the manuals say to do, which has always worked just fine
and dandy on every Unix system I've ever done profiling on.  But on A/UX,
the results were nearly worthless.  After looking at the output of prof(1)
on A/UX and reaching the conclusion that its validity was doubtful (it showed
time being spent in a function I *knew for a fact* was never called!),
I profiled the same program on a VAX and a PMAX.  The prof(1) results on the
VAX and the PMAX were consistent with each other, but the results on A/UX
did not even vaguely resemble the other two.  So, it became crystal clear
that prof(1) on A/UX was not giving me any facts.

While it turned out that the profiles obtained on the VAX and the PMAX were
sufficient to solve the problem I was working on, my curiosity was not
satisfied.  How could A/UX's odd results be explained?  So, I took a look
inside mon.out.  Interesting.

The program I was profiling was 600K+ in size.  In mon.out, there were 300K+
words allocated for PC counters (one per instruction in the program).  While
the call to profil(2) was made with parameters clearly intended to make use of
all this space, only the first 32K words of mon.out had non-zero counts.  But
I knew *for a fact* that large amounts of CPU time were consumed by functions
toward the end of the program, which would results in non-zero counts in the
latter portion of mon.out, if everything was working as the manuals say it
does.

My hypothesis is that the A/UX kernel, when performing the sampling for the
profile, truncates the index into the buffer to 16 bits.  To test this
hypothesis, I set up a call to profil(2) with parameters to map the 600K+
program's address space to a 32K-word buffer.  Lo and behold, it did indeed
use the entire buffer, and the non-zero counts appeared in approximately
the right slots of the buffer (but not exactly -- see later comments).

With the truncation hypothesis in mind, I reexamined the "useless" mon.out
data.  Under the assumption that each slot represented the combined counts
for about ten different address ranges (separated by 64K bytes), I was able
to interpret the results to determine where the time was actually being
consumed.  This lent further support for my truncation hypothesis.

However, I don't think that I really understand yet exactly what profil(2) is
doing.  While my arranging for the program's address range to map into 32K
slots in the buffer seemed to make some sense, it didn't really work out
to something I could rely upon.  Interpreting the results according to the
documentation of profile(2) still raised some problems.  Counters corresponding
to addresses near the beginnning of the program seemed to be correct, but
for higher addresses, the counters were at lower slots than they should have
been.  The discrepancy seemed to get larger as addresses grew higher.

So, I have questions for all you A/UXperts out there.  Have any of you 
made profil(2)/prof(1) work successfully for large programs?  Do any of you
understand what profil(2) actually does (as opposed to what TFM says it does)?
(Note: I get identical behavior on both A/UX 1.1 and A/UX 2.0, just in case
you hoped Apple had fixed something in 2.0.)

If there is any enlightenment out there, I would appreciate it.
-- 
Dave Robbins                    GTE Laboratories Incorporated
drobbins@bunny.gte.com          40 Sylvan Rd.
...!harvard!bunny!drobbins      Waltham, MA 02254
CYA:  I speak only for myself; GTE may disagree with what I say.