dcr0@GTE.COM (David Robbins) (08/21/90)
I recently had a need to profile the CPU time usage of a program running on an A/UX system. My experience was, shall we say, rather frustrating. I did all the stuff the manuals say to do, which has always worked just fine and dandy on every Unix system I've ever done profiling on. But on A/UX, the results were nearly worthless. After looking at the output of prof(1) on A/UX and reaching the conclusion that its validity was doubtful (it showed time being spent in a function I *knew for a fact* was never called!), I profiled the same program on a VAX and a PMAX. The prof(1) results on the VAX and the PMAX were consistent with each other, but the results on A/UX did not even vaguely resemble the other two. So, it became crystal clear that prof(1) on A/UX was not giving me any facts. While it turned out that the profiles obtained on the VAX and the PMAX were sufficient to solve the problem I was working on, my curiosity was not satisfied. How could A/UX's odd results be explained? So, I took a look inside mon.out. Interesting. The program I was profiling was 600K+ in size. In mon.out, there were 300K+ words allocated for PC counters (one per instruction in the program). While the call to profil(2) was made with parameters clearly intended to make use of all this space, only the first 32K words of mon.out had non-zero counts. But I knew *for a fact* that large amounts of CPU time were consumed by functions toward the end of the program, which would results in non-zero counts in the latter portion of mon.out, if everything was working as the manuals say it does. My hypothesis is that the A/UX kernel, when performing the sampling for the profile, truncates the index into the buffer to 16 bits. To test this hypothesis, I set up a call to profil(2) with parameters to map the 600K+ program's address space to a 32K-word buffer. Lo and behold, it did indeed use the entire buffer, and the non-zero counts appeared in approximately the right slots of the buffer (but not exactly -- see later comments). With the truncation hypothesis in mind, I reexamined the "useless" mon.out data. Under the assumption that each slot represented the combined counts for about ten different address ranges (separated by 64K bytes), I was able to interpret the results to determine where the time was actually being consumed. This lent further support for my truncation hypothesis. However, I don't think that I really understand yet exactly what profil(2) is doing. While my arranging for the program's address range to map into 32K slots in the buffer seemed to make some sense, it didn't really work out to something I could rely upon. Interpreting the results according to the documentation of profile(2) still raised some problems. Counters corresponding to addresses near the beginnning of the program seemed to be correct, but for higher addresses, the counters were at lower slots than they should have been. The discrepancy seemed to get larger as addresses grew higher. So, I have questions for all you A/UXperts out there. Have any of you made profil(2)/prof(1) work successfully for large programs? Do any of you understand what profil(2) actually does (as opposed to what TFM says it does)? (Note: I get identical behavior on both A/UX 1.1 and A/UX 2.0, just in case you hoped Apple had fixed something in 2.0.) If there is any enlightenment out there, I would appreciate it. -- Dave Robbins GTE Laboratories Incorporated drobbins@bunny.gte.com 40 Sylvan Rd. ...!harvard!bunny!drobbins Waltham, MA 02254 CYA: I speak only for myself; GTE may disagree with what I say.