[comp.unix.questions] ---- Running time & code efficiency

siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) (09/05/90)

Hi. I am trying to analyze my C code and make it run faster.
The program is to translate files in one format to another.
I wonder if there is a generic way to decide a lower bound
of the running time for this kind of problem?
I know function calls consume time so I look through
the code and cut off some unnecessary ones;
I know you can trade space for speed, however I wonder how far
you can go along this direction. But I cannot get much from these
considerations. Any comments? Suggections?

What is your favorate tool for timing analysis? I used "prof" but
it did not give me the timing imformation for all functions in
my program -- just a few functions were listed. I also used the
system provided function called "times", it is pretty good.

The last questions: why doesn't the "time" command give me an identical
user time each time I run the same thing? In the "time"'s report,
is the time spent on the function calls by the system accounted in
the user time or system time?

Thank you for your help and have a nice day.

siping@cerc.wvu.wvnet.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/06/90)

In article <742@babcock.cerc.wvu.wvnet.edu> siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes:
>Hi. I am trying to analyze my C code and make it run faster.

Let's hope that it really needs to run faster,
or else that you're doing this as an educational exercise.

>The program is to translate files in one format to another.
>I wonder if there is a generic way to decide a lower bound
>of the running time for this kind of problem?

Well, there can't be a simple one since a vast number of nontrivial
programs, including for example compilers, fall into this category.

>I know function calls consume time so I look through
>the code and cut off some unnecessary ones;
>I know you can trade space for speed, however I wonder how far
>you can go along this direction. But I cannot get much from these
>considerations. Any comments? Suggections?

Maintainability of the code suffers as you perform this sort of
low-level tweaking.  Much of the time spent in a typical program
is localized, i.e. involves a small portion of the total code.

>What is your favorate tool for timing analysis? I used "prof" but
>it did not give me the timing imformation for all functions in
>my program -- just a few functions were listed. I also used the
>system provided function called "times", it is pretty good.

"prof" should work pretty well if you compile all your sources with
"cc -p" and if you also LINK with "cc -p" in order to get function
counts for library routines.  If you have a good version of "prof"
(found on fewer and fewer systems as time goes on), you can obtain
a nice plot of an integral histogram over code space, which is very
handy for spotting "hot spots" where a lot of execution time is spent.

>The last questions: why doesn't the "time" command give me an identical
>user time each time I run the same thing? In the "time"'s report,
>is the time spent on the function calls by the system accounted in
>the user time or system time?

"Real time" depends on system load and other external factors, so it
is a poor indicator unless you are able to run your application on
an entirely idle system (no background daemons, spoolers, etc.).
"User time" is time spent executing in "user mode", which means within
the application proper, exclusive of system calls.  "System time" is
that spent by the operating system kernel on behalf of your process as
the result of system calls that the process makes.

The times vary slightly due to clock sampling (quantization) errors
and inherent inaccuracies in assigning time for certain system
services.  Note also that if the process invoked subprocesses, "time"
will include the subprocess times if and only if the subprocesses
terminate before the top-level process.

There are many methods for improving code efficiency.  Sometimes the
algorithm being used is not the most efficient method of solving the
problem, and changing to a better algorithm could speed things up
far more than any amount of low-level tweaking.  If the overall
algorithms are optimal, then once you have located the bottleneck(s)
via profiling you can consider detailed low-level ways to speed them
up.  There are many standard tricks, but sometimes it is possible to
dream up a new way to improve speed.  Jon Bentley's books are a good
starting point for learning such methods.

martin@mwtech.UUCP (Martin Weitzel) (09/07/90)

In article <742@babcock.cerc.wvu.wvnet.edu> siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes:
>Hi. I am trying to analyze my C code and make it run faster.
>The program is to translate files in one format to another.
>I wonder if there is a generic way to decide a lower bound
>of the running time for this kind of problem?

Your description of the problem is too vague to decide about a
lower bound - my first guess would be that the time is proportional
to the length of the file read and the file written and of course
somewhat higher than reading and writing two such files without any
conversion.

If you work under UNIX, you may use cat, cp or dd to copy your
input file to /dev/null, resulting in a time I call "r_time".
Next take a file of the length of your output-file an do the
same, resulting in a time I call "r0_time". Finally just copy the
second file (not to /dev/null!), resulting in a time I call
"w_time". Now, the lowest bound - without any conversion -
for your problem is around "r_time + w_time - r0_time". (Here
I assumed that standard utilities like the one I mentioned
are well adapted to the system - eg. they use Standard-I/O
with a blocksize that matches the appropriate values for the
given file system - an assumption which may sometimes be wrong!)

>I know function calls consume time so I look through
>the code and cut off some unnecessary ones;
>I know you can trade space for speed, however I wonder how far
>you can go along this direction. But I cannot get much from these
>considerations. Any comments? Suggections?

Again, this depends so much on your problem, that few can be said.
What kind of conversion is it, you want to make? With simple
byte-to-byte conversions you may speed things up using tables
for look-up instead of compares. If the required conversions
depend on the context, you may switch between several tables.
For certain more complex conversions where tables are less
helpfull, you may eventually trade-in speed for space with
caching or pre-evaluating common patterns - but how much this
would help depends further on the typical input data.

>What is your favorate tool for timing analysis? I used "prof" but
>it did not give me the timing imformation for all functions in
>my program -- just a few functions were listed. I also used the
>system provided function called "times", it is pretty good.

Under UNIX prof is a good starting point. Note that prof gathers
the timing by statistical methods: At run-time in fixed intervals
the value of the program counter is sampled (only for profiled
programs, of course). This value is scaled and mapped into cells
of an array of counters, which is located in a special run-time
start-up module that is linked to your program if "-p" is specified
at this step. The special run-time start-up writes this array later
to the "mon.out" file. With the command prof the "mon.out" is mapped
mapped back to locations in the actual functions of the programs, using
the information of the name tables of "a.out" that are normally left
there by the linker for the purpose of debugging.

It's a little bit like if you stand up every sixty minutes from your
desk, walk to the window and make some notes of what you see outside.
After doing this for several weeks, your notes may be a good
approximation of what happens in the street outside your window.
But you may well have missed important events. Especially, if you
do this *exactly* every full hour, then noting the positions
of the hands of a clock out on the street will give a very wrong
impression of the outside world (if you base a description of the
outside world exclusively on your notes, not on common experience
with clocks :-)).

You should allways look to the results of prof with this in mind.
Chances are that very small functions are missed or timings are
mapped not to the true culprit, but in general you will get a good
approximation (Especially those "stroboscobic" effects I described at
the end of the last paragraph will occur very very very seldom.)

If you use prof you should also have a look at the number of calls
for every function. If you compile with the "-p" switch from the
source to the object, the compiler inserts a little extra code at
the start of each function to count the number of calls; you will
normally not have this extra code in the standard library, so that
you will see #calls-counts only for functions you've written yourself.
But on most systems there is a special compiled standard library
available. The usual name for this library is /lib/libc_p.a so you
can simply add "-lc_p" at the end of the command line in which you
call the link phase to produce the executable program, and all the
standard functions are taken from this special library instead from
the regular one.

As prof prints the number of calls for all functions, if a function seems
to consume much time but has a low number of calls (and vice versa)
this must not be wrong but should look suspicious to you and deserve
further investigation. Same if the order of the function in the source
file or the order in which the object files are given to the linker
change the timings substantially.

>The last questions: why doesn't the "time" command give me an identical
>user time each time I run the same thing? In the "time"'s report,
>is the time spent on the function calls by the system accounted in
>the user time or system time?

The user (and sys) time are also meassured with a (very similar)
statistical method as described above for the profiling (the only
difference is that no special run-time start-up module is required
as the program counter is not sampled; only the fact if the process
happens to be in "user-" or "system-mode" is noted with the time.)
The user(-mode) time accounts for the code *you* have written. You
have influence on this time by changing your code. The system time
accounts for the code the kernel spends for request from your
program(%). To change this time execute fewer system calls or make
the system calls have less work (or hire some people to rewrite
the kernal for you; as you need the kernal-source for this, you
should be prepared to have some $$ left to buy it from AT&T - I'd
say some hundredthousands should usually suffice :-)).

%: I'm aware that there is a small amount of error in this time due
to activity caused by interrupts. But this will only be of importance
in very special situations and need not be considered here.

>Thank you for your help and have a nice day.

Have a nice day too and if you can save the time, walk to the next
book store (or your campus library if you are short on money) and
look if they have copies of the following two:

	"Writing Efficient Programs" by Jon Bentley
and
	"Efficient C" by Thomas Plum and Jim Brodie

You'll not be disappointed and you can learn much from both books!
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

moraes@cs.toronto.edu (Mark Moraes) (09/07/90)

siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes:
>What is your favorate tool for timing analysis? I used "prof" but
>it did not give me the timing imformation for all functions in
>my program -- just a few functions were listed. I also used the
>system provided function called "times", it is pretty good.

gprof is very good -- not only does it show you which routines are
gobbling CPU, but where they're called from so you can identify why
they're gobbling CPU.  (As in, "who is calling malloc() a million times
and why?")  It also produces a flat prof(1) style profile.  [For those
with systems without gprof, gcc supports it, and the GNU binutils come
with gprof]

Paul Haahr posted a lovely tool called lcomp to comp.sources.unix back
in October 1989 (Volume 20, Issue 82).  It runs on Vaxen and Sun3s
(should work on most of 68k family with pcc compilers) that does
instruction counting (based on Peter Weinberger's paper on "Cheap
dynamic instruction counting" in Bell Labs Tech J, OCt 84 -- the gold
book) so you can see how often each line of source was executed.  Suns
also have something called tcov that does something similar -- I
remember trying it and preferring lcomp.

gprof gets you down to the function level, lcomp helps peer inside the
function. One can always emulate lcomp by sticking in counters by hand
in the appropriate places.

Working out the order of a program's run time is a little difficult
for large programs.  One can try to approximate times for typical
programs by finding the most executed code, (gprof is a great help)
and working out rough relationships for the input and the time taken.
It's a good way to figure out what the code is doing and why it's
taking so long doing it.

Jon Bentley's "Writing efficient programs" (Prentice-Hall) is
recommended reading.  I liked Dennie Van Tassel's "Program Design,
Style, Efficiency and Testing" (?).  And you can always pick up the C
News paper from Usenix '87 (Winter?) or by anon. ftp from
cs.toronto.edu (doc/programming/cnews.{tbl.ms,doc,ps}) -- has some
interesting lessons on speeding up system programs.

	Mark.