philj@tekig5.PEN.TEK.COM (Phil Jansen) (11/10/88)
Hello there. I've been trying to use the LightSpeed C 3.01 profiler for a class I'm taking. Unfortunately, I don't trust the answers it gives me. Are there people out there who have used it enough to help me out? Details: I have been timing several sorting algorithms, some of which call other functions, or call themselves recursively. Usually the profiler will not include sub-function times in the total function time, but in my case I needed the total time including sub-function calls. Both before and after disabling this feature, these functions gave slower times than non-recursive functions (e.g. Quicksort was 10x slower than Bubblesort!) -------------- Also, I tried running some simple in-line function calls, but the profiler only printed the results of *some* of the functions! All the functions were in the same file (compiled with the profiler option on). Eventually I gave up on the profiler and used TickCount() to get my times. [It sure wasn't easy to find where the definition for TickCount() was (No, it is not in time.h).] Any ideas what I'm doing wrong? Thank you for your help. Phil Jansen -- Phil Jansen If you repeat things often enough, they become true. philj@tekig5 If you repeat things often enough, they become true. If you repeat things often enough, they become true.
suitti@haddock.ima.isc.com (Steve Uitti) (11/11/88)
In article <3452@tekig5.PEN.TEK.COM> philj@tekig5.PEN.TEK.COM (Phil Jansen) writes: >Hello there. I've been trying to use the LightSpeed C 3.01 profiler... >I don't trust the answers it gives me. I have used it with LSC 1.13, 1.15, 3.0 (patched), with programs that use stdio & all the UNIX like stuff (and I think I used it with a real Mac program) with no unusual problems. >I have been timing several sorting algorithms, some of which >call other functions, or call themselves recursively. At least one of the things I profiled was recursive. >Usually the profiler will not include sub-function times in the total >function time, but in my case I needed the total time including >sub-function calls. The only things it has failed to mention are library calls. The libraries were not profiled. >Both before and after disabling this feature, these functions gave slower >times than non-recursive functions (e.g. Quicksort was 10x slower than >Bubblesort!) Profiling does two things: It adds up the time spent in each routine. Usually, time spent in each routine is done buy having a hard ware interupt look to see where it is, and increment the time for that routine. I think LSC actually puts code in the function entry/exit to look at the clock and add up the time actuall spent. This is more accurate, but also more overhead. It counts how many times a routine was called. This is done by having the routine entry increment a variable. Both of these things add overhead to routine calls. Thus, a profiled routine runs slower than a non profiled routine. In the case of qsort, when it gets down to a small subsection of the problem, there are lots of little calls that return almost right away. Check the profiled run time with the non profiled run time of the same routine. For qsort, there may be a big difference. For a bubble sort, there may be very little. For most of the things I've profiled, the run time increases by about 10%. I'm usually interested in finding out where the time is spent and how often things are actually called. If a routine uses lots of time, I try to see if it could do things quicker. This can often get you a 10% speedup, which isn't great, but doesn't take much effort. If a routine is called alot, things can often be rearranged. The algorythm is modified, and speedups of between 10% and 1000% are not uncommon. More effort is usally put into this. The other thing I've been interested in is how to speed up certain sections of user-interactive code. I had one program which had instant response everywhere except this one spot. On the Mac II, there was a delay of a half second, and it was no big deal. On a Mac SE, it took over ten seconds. It turned out that the code computed a table of integers (a probability table) using floating point. The solution turn out to be to compute the constants with a standalone program, and have it build C source to be compiled in with the main program. It could have put the data into a resource... Anyway, user response can be real important, even when the times are "small". A shell sort also tends to be n*log(n), tends to have good worst case behavior, and on many machines tends to be quicker than qsort for large (> 100) elements. It also doesn't require stack space (qsort requires stack space proportional to log(n)) or any extra large buffers. Stephen Only 70 more days of Ronald Reagan.