[comp.sys.mac.programmer] LightSpeed C profiler problem

philj@tekig5.PEN.TEK.COM (Phil Jansen) (11/10/88)

Hello there.  I've been trying to use the LightSpeed C 3.01 profiler for a
class I'm taking.  Unfortunately, I don't trust the answers it gives me.

Are there people out there who have used it enough to help me out?

	Details:

I have been timing several sorting algorithms, some of which
call other functions, or call themselves recursively.

Usually the profiler will not include sub-function times in the total
function time, but in my case I needed the total time including
sub-function calls.

Both before and after disabling this feature, these functions gave slower
times than non-recursive functions (e.g. Quicksort was 10x slower than
Bubblesort!)

--------------

Also, I tried running some simple in-line function calls, but the profiler
only printed the results of *some* of the functions!  All the functions were
in the same file (compiled with the profiler option on).

Eventually I gave up on the profiler and used TickCount() to get my times.
[It sure wasn't easy to find where the definition for TickCount() was
(No, it is not in time.h).]

Any ideas what I'm doing wrong?


Thank you for your help.

Phil Jansen


-- 
Phil Jansen            If you repeat things often enough, they become true.
philj@tekig5           If you repeat things often enough, they become true.
                       If you repeat things often enough, they become true.

suitti@haddock.ima.isc.com (Steve Uitti) (11/11/88)

In article <3452@tekig5.PEN.TEK.COM> philj@tekig5.PEN.TEK.COM (Phil Jansen) writes:
>Hello there.  I've been trying to use the LightSpeed C 3.01 profiler...
>I don't trust the answers it gives me.
	I have used it with LSC 1.13, 1.15, 3.0 (patched), with
programs that use stdio & all the UNIX like stuff (and I think I used
it with a real Mac program) with no unusual problems.

>I have been timing several sorting algorithms, some of which
>call other functions, or call themselves recursively.
	At least one of the things I profiled was recursive.

>Usually the profiler will not include sub-function times in the total
>function time, but in my case I needed the total time including
>sub-function calls.
	The only things it has failed to mention are library calls.
The libraries were not profiled.

>Both before and after disabling this feature, these functions gave slower
>times than non-recursive functions (e.g. Quicksort was 10x slower than
>Bubblesort!)
	Profiling does two things:

It adds up the time spent in each routine.  Usually, time spent in
each routine is done buy having a hard ware interupt look to see where
it is, and increment the time for that routine.  I think LSC actually
puts code in the function entry/exit to look at the clock and add up
the time actuall spent.  This is more accurate, but also more
overhead.

It counts how many times a routine was called.  This is done by having
the routine entry increment a variable.

	Both of these things add overhead to routine calls.  Thus, a
profiled routine runs slower than a non profiled routine.  In the case
of qsort, when it gets down to a small subsection of the problem,
there are lots of little calls that return almost right away.  Check
the profiled run time with the non profiled run time of the same
routine.  For qsort, there may be a big difference.  For a bubble
sort, there may be very little.

	For most of the things I've profiled, the run time increases
by about 10%.  I'm usually interested in finding out where the time is
spent and how often things are actually called.  If a routine uses
lots of time, I try to see if it could do things quicker.  This can
often get you a 10% speedup, which isn't great, but doesn't take much
effort.  If a routine is called alot, things can often be rearranged.
The algorythm is modified, and speedups of between 10% and 1000% are
not uncommon.  More effort is usally put into this.

	The other thing I've been interested in is how to speed up
certain sections of user-interactive code.  I had one program which
had instant response everywhere except this one spot.  On the Mac II,
there was a delay of a half second, and it was no big deal.  On a Mac
SE, it took over ten seconds.  It turned out that the code computed a
table of integers (a probability table) using floating point.  The
solution turn out to be to compute the constants with a standalone
program, and have it build C source to be compiled in with the main
program.  It could have put the data into a resource...  Anyway, user
response can be real important, even when the times are "small".

	A shell sort also tends to be n*log(n), tends to have good
worst case behavior, and on many machines tends to be quicker than
qsort for large (> 100) elements.  It also doesn't require stack space
(qsort requires stack space proportional to log(n)) or any extra large
buffers.

Stephen
	Only 70 more days of Ronald Reagan.