siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) (09/05/90)
Hi. I am trying to analyze my C code and make it run faster. The program is to translate files in one format to another. I wonder if there is a generic way to decide a lower bound of the running time for this kind of problem? I know function calls consume time so I look through the code and cut off some unnecessary ones; I know you can trade space for speed, however I wonder how far you can go along this direction. But I cannot get much from these considerations. Any comments? Suggections? What is your favorate tool for timing analysis? I used "prof" but it did not give me the timing imformation for all functions in my program -- just a few functions were listed. I also used the system provided function called "times", it is pretty good. The last questions: why doesn't the "time" command give me an identical user time each time I run the same thing? In the "time"'s report, is the time spent on the function calls by the system accounted in the user time or system time? Thank you for your help and have a nice day. siping@cerc.wvu.wvnet.edu
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/06/90)
In article <742@babcock.cerc.wvu.wvnet.edu> siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes: >Hi. I am trying to analyze my C code and make it run faster. Let's hope that it really needs to run faster, or else that you're doing this as an educational exercise. >The program is to translate files in one format to another. >I wonder if there is a generic way to decide a lower bound >of the running time for this kind of problem? Well, there can't be a simple one since a vast number of nontrivial programs, including for example compilers, fall into this category. >I know function calls consume time so I look through >the code and cut off some unnecessary ones; >I know you can trade space for speed, however I wonder how far >you can go along this direction. But I cannot get much from these >considerations. Any comments? Suggections? Maintainability of the code suffers as you perform this sort of low-level tweaking. Much of the time spent in a typical program is localized, i.e. involves a small portion of the total code. >What is your favorate tool for timing analysis? I used "prof" but >it did not give me the timing imformation for all functions in >my program -- just a few functions were listed. I also used the >system provided function called "times", it is pretty good. "prof" should work pretty well if you compile all your sources with "cc -p" and if you also LINK with "cc -p" in order to get function counts for library routines. If you have a good version of "prof" (found on fewer and fewer systems as time goes on), you can obtain a nice plot of an integral histogram over code space, which is very handy for spotting "hot spots" where a lot of execution time is spent. >The last questions: why doesn't the "time" command give me an identical >user time each time I run the same thing? In the "time"'s report, >is the time spent on the function calls by the system accounted in >the user time or system time? "Real time" depends on system load and other external factors, so it is a poor indicator unless you are able to run your application on an entirely idle system (no background daemons, spoolers, etc.). "User time" is time spent executing in "user mode", which means within the application proper, exclusive of system calls. "System time" is that spent by the operating system kernel on behalf of your process as the result of system calls that the process makes. The times vary slightly due to clock sampling (quantization) errors and inherent inaccuracies in assigning time for certain system services. Note also that if the process invoked subprocesses, "time" will include the subprocess times if and only if the subprocesses terminate before the top-level process. There are many methods for improving code efficiency. Sometimes the algorithm being used is not the most efficient method of solving the problem, and changing to a better algorithm could speed things up far more than any amount of low-level tweaking. If the overall algorithms are optimal, then once you have located the bottleneck(s) via profiling you can consider detailed low-level ways to speed them up. There are many standard tricks, but sometimes it is possible to dream up a new way to improve speed. Jon Bentley's books are a good starting point for learning such methods.
martin@mwtech.UUCP (Martin Weitzel) (09/07/90)
In article <742@babcock.cerc.wvu.wvnet.edu> siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes: >Hi. I am trying to analyze my C code and make it run faster. >The program is to translate files in one format to another. >I wonder if there is a generic way to decide a lower bound >of the running time for this kind of problem? Your description of the problem is too vague to decide about a lower bound - my first guess would be that the time is proportional to the length of the file read and the file written and of course somewhat higher than reading and writing two such files without any conversion. If you work under UNIX, you may use cat, cp or dd to copy your input file to /dev/null, resulting in a time I call "r_time". Next take a file of the length of your output-file an do the same, resulting in a time I call "r0_time". Finally just copy the second file (not to /dev/null!), resulting in a time I call "w_time". Now, the lowest bound - without any conversion - for your problem is around "r_time + w_time - r0_time". (Here I assumed that standard utilities like the one I mentioned are well adapted to the system - eg. they use Standard-I/O with a blocksize that matches the appropriate values for the given file system - an assumption which may sometimes be wrong!) >I know function calls consume time so I look through >the code and cut off some unnecessary ones; >I know you can trade space for speed, however I wonder how far >you can go along this direction. But I cannot get much from these >considerations. Any comments? Suggections? Again, this depends so much on your problem, that few can be said. What kind of conversion is it, you want to make? With simple byte-to-byte conversions you may speed things up using tables for look-up instead of compares. If the required conversions depend on the context, you may switch between several tables. For certain more complex conversions where tables are less helpfull, you may eventually trade-in speed for space with caching or pre-evaluating common patterns - but how much this would help depends further on the typical input data. >What is your favorate tool for timing analysis? I used "prof" but >it did not give me the timing imformation for all functions in >my program -- just a few functions were listed. I also used the >system provided function called "times", it is pretty good. Under UNIX prof is a good starting point. Note that prof gathers the timing by statistical methods: At run-time in fixed intervals the value of the program counter is sampled (only for profiled programs, of course). This value is scaled and mapped into cells of an array of counters, which is located in a special run-time start-up module that is linked to your program if "-p" is specified at this step. The special run-time start-up writes this array later to the "mon.out" file. With the command prof the "mon.out" is mapped mapped back to locations in the actual functions of the programs, using the information of the name tables of "a.out" that are normally left there by the linker for the purpose of debugging. It's a little bit like if you stand up every sixty minutes from your desk, walk to the window and make some notes of what you see outside. After doing this for several weeks, your notes may be a good approximation of what happens in the street outside your window. But you may well have missed important events. Especially, if you do this *exactly* every full hour, then noting the positions of the hands of a clock out on the street will give a very wrong impression of the outside world (if you base a description of the outside world exclusively on your notes, not on common experience with clocks :-)). You should allways look to the results of prof with this in mind. Chances are that very small functions are missed or timings are mapped not to the true culprit, but in general you will get a good approximation (Especially those "stroboscobic" effects I described at the end of the last paragraph will occur very very very seldom.) If you use prof you should also have a look at the number of calls for every function. If you compile with the "-p" switch from the source to the object, the compiler inserts a little extra code at the start of each function to count the number of calls; you will normally not have this extra code in the standard library, so that you will see #calls-counts only for functions you've written yourself. But on most systems there is a special compiled standard library available. The usual name for this library is /lib/libc_p.a so you can simply add "-lc_p" at the end of the command line in which you call the link phase to produce the executable program, and all the standard functions are taken from this special library instead from the regular one. As prof prints the number of calls for all functions, if a function seems to consume much time but has a low number of calls (and vice versa) this must not be wrong but should look suspicious to you and deserve further investigation. Same if the order of the function in the source file or the order in which the object files are given to the linker change the timings substantially. >The last questions: why doesn't the "time" command give me an identical >user time each time I run the same thing? In the "time"'s report, >is the time spent on the function calls by the system accounted in >the user time or system time? The user (and sys) time are also meassured with a (very similar) statistical method as described above for the profiling (the only difference is that no special run-time start-up module is required as the program counter is not sampled; only the fact if the process happens to be in "user-" or "system-mode" is noted with the time.) The user(-mode) time accounts for the code *you* have written. You have influence on this time by changing your code. The system time accounts for the code the kernel spends for request from your program(%). To change this time execute fewer system calls or make the system calls have less work (or hire some people to rewrite the kernal for you; as you need the kernal-source for this, you should be prepared to have some $$ left to buy it from AT&T - I'd say some hundredthousands should usually suffice :-)). %: I'm aware that there is a small amount of error in this time due to activity caused by interrupts. But this will only be of importance in very special situations and need not be considered here. >Thank you for your help and have a nice day. Have a nice day too and if you can save the time, walk to the next book store (or your campus library if you are short on money) and look if they have copies of the following two: "Writing Efficient Programs" by Jon Bentley and "Efficient C" by Thomas Plum and Jim Brodie You'll not be disappointed and you can learn much from both books! -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
moraes@cs.toronto.edu (Mark Moraes) (09/07/90)
siping@cathedral.cerc.wvu.wvnet.edu (Siping Liu) writes: >What is your favorate tool for timing analysis? I used "prof" but >it did not give me the timing imformation for all functions in >my program -- just a few functions were listed. I also used the >system provided function called "times", it is pretty good. gprof is very good -- not only does it show you which routines are gobbling CPU, but where they're called from so you can identify why they're gobbling CPU. (As in, "who is calling malloc() a million times and why?") It also produces a flat prof(1) style profile. [For those with systems without gprof, gcc supports it, and the GNU binutils come with gprof] Paul Haahr posted a lovely tool called lcomp to comp.sources.unix back in October 1989 (Volume 20, Issue 82). It runs on Vaxen and Sun3s (should work on most of 68k family with pcc compilers) that does instruction counting (based on Peter Weinberger's paper on "Cheap dynamic instruction counting" in Bell Labs Tech J, OCt 84 -- the gold book) so you can see how often each line of source was executed. Suns also have something called tcov that does something similar -- I remember trying it and preferring lcomp. gprof gets you down to the function level, lcomp helps peer inside the function. One can always emulate lcomp by sticking in counters by hand in the appropriate places. Working out the order of a program's run time is a little difficult for large programs. One can try to approximate times for typical programs by finding the most executed code, (gprof is a great help) and working out rough relationships for the input and the time taken. It's a good way to figure out what the code is doing and why it's taking so long doing it. Jon Bentley's "Writing efficient programs" (Prentice-Hall) is recommended reading. I liked Dennie Van Tassel's "Program Design, Style, Efficiency and Testing" (?). And you can always pick up the C News paper from Usenix '87 (Winter?) or by anon. ftp from cs.toronto.edu (doc/programming/cnews.{tbl.ms,doc,ps}) -- has some interesting lessons on speeding up system programs. Mark.