n0ano@wldrdg.UUCP (05/06/87)
After getting MINIX up and running on my PC clone I decided to find out where the kernel was spending its time. What I did was modify the clock interrupt handler to update a counter for every value of the instruction pointer when the clock interrupt occurred. (Actually, there wasn't enough data space in the kernel so counters keep track of every 16 instruction bytes - that granularity should be reasonable.) I then created a user program that reads the counters from `/dev/kmem' and correlates them with the kernel's load maps to show a profile of where the kernel is spending its time. I then profiled the kernel while re-compiling and loading all of the files in the kernel directory. This task took about 45 minutes on my Zenith 150, 4.7 MHz 8088, 704K RAM, 10 Mbyte Winchester. The listing at the end of this article shows how many clock ticks (1 tick = 16.6 msec) were spent in either user mode or each of the routines in the kernel, file system or memory manager. A few observations about the results: 1. If you add up all the ticks you get the following breakdown: User 142810 kernel 10192 fs 13286 mm 4532 Total 170820 This means that MINIX (kernel + fs + mm) only used 28010 out of 170820 ticks or about 16% of the CPU. Considering that MINIX is a message passing kernel this isn't a bad overhead figure. 2. The time in `idle' was probably time spent waiting for disk I/O to complete. I didn't measure how many disk requests were made to the winchester so I don't know if this is a reasonable amount of time. 3. The profile for the kernel itself is not as acurate as it should be. Unfortunately, MINIX spends a lot of time with all interrupts disabled. This is why so little time is shown in `phys_copy' or `cp_mess'. One of the things I want to do is change the kernel's lock/unlock mechanism to use the interrupt mask in the PIC rather than the interrupt enable bit in the flag register. This way I should be able to get better profiling of the kernel itself. Since the file system and memory manager run as tasks their profiles should be reasonably accurate. 4. One promising note is that there doesn't appear to be any major CPU hogs in the kernel. The biggest routine, sendrec, really doesn't take up that much time. It probably should be optimized if at all possible but you won't gain much if you do that. Don Dugger Wildridge Consulting ...nbires!wldrdg!n0ano user ticks = 142810 kernel file system memory manager idle( 140) = 5992 _sendrec(4F4B) = 6663 _sendrec(2668) = 3941 _sendrec(5ED2) = 2009 _zero_bl(1C57) = 2320 _do_exec( BCC) = 210 _winches(44F9) = 1034 _read_wr( CC3) = 747 _adjust( 845) = 65 _mem_tas(364C) = 418 _get_blo(390B) = 283 _size_ok( AAB) = 31 _umap(168A) = 149 _put_blo(3AEB) = 267 _main( 1B) = 31 .dvi4(63E1) = 93 _read_ma(13DD) = 257 _mem_cop(21A9) = 29 .csb2(63A5) = 72 _alloc_b(3226) = 197 _do_fork( 358) = 24 .dvu4(64B4) = 59 _get_sup(3484) = 197 _callm1(2558) = 23 .cuu(63BC) = 53 .dvi4(5474) = 193 _send(265E) = 20 .rmi4(6522) = 51 .cuu(544F) = 156 _callm3(2594) = 18 _clock_t(324E) = 48 _main( 1B) = 147 _do_brk( 7CD) = 18 .rmu4(65A1) = 43 _search_(2675) = 133 _sys_new(2468) = 17 _cp_mess(5904) = 34 .csb2(5438) = 112 _reply( EA) = 13 _vid_cop(59F0) = 21 _get_ino(35AF) = 108 _callx(25F2) = 12 _port_ou(5933) = 21 _find_fi(3EE1) = 107 _do_wait( 626) = 10 _keyboar(284B) = 18 _release(1F0E) = 102 _do_ksig(154A) = 9 _receive(5ECD) = 13 _cmp_str(4A64) = 95 _sys_cop(2489) = 8 .csa2(6387) = 13 _copy(4A9A) = 93 _sys_xit(23B2) = 5 _set_nam(5DF4) = 13 .rmi4(5558) = 79 .cmu4(2B1D) = 5 _port_in(5943) = 12 _clock_t(4A00) = 77 _free_me(1E58) = 5 _get_tot(6361) = 10 _last_di(240A) = 74 _tell_fs(2536) = 4 .mli4(6511) = 7 _rw_user(1637) = 72 _alloc_m(1DDB) = 4 _inform(15DF) = 5 _rw_bloc(3D4E) = 64 _len(2637) = 4 _portw_i(5967) = 3 _free_bi(33C5) = 57 _sys_get(23D3) = 3 _tty_tas(178C) = 1 _dev_io(2164) = 50 _get_tot(2AF7) = 3 _do_ioct(21EB) = 50 .csb2(2B37) = 3 _scale_f(350F) = 49 _do_sign(140A) = 2 .mli4(5547) = 49 _sys_exe(2447) = 2 _get_fil(3EB4) = 44 _max_hol(1FD7) = 2 _do_fork(4830) = 29 _allowed(205F) = 2 _sys_cop(4D6C) = 28 .cuu(2B4E) = 2 _pipe_ch(1DB0) = 27 _read(2365) = 2 _get_fd(3E29) = 25 _sys_for(2424) = 1 _do_sync(4715) = 25 _do_paus(1915) = 1 _rw_dev(22FF) = 25 _close(232B) = 1 _read_ah(16BD) = 24 _sig_pro(177B) = 1 _rw_inod(384A) = 21 _open(234A) = 1 _dev_clo(213D) = 21 _reply( 17B) = 20 _dup_ino(38F5) = 18 _forbidd(43F6) = 15 _len(4F1A) = 14 _do_writ(1713) = 12 _do_fsta(40CC) = 11 _fetch_n(4B33) = 11 _put_ino(3648) = 11 _advance(2545) = 10 _new_blo(1AF8) = 10 _do_lsee( BE0) = 10 _do_read( CB5) = 8 _do_clos( AE3) = 7 _truncat(2EDF) = 7 _free_zo(3D1A) = 6 _do_open( 99C) = 5 _mounted(34C9) = 5 _revive(1F98) = 5 _wipe_in(37E2) = 5 _alloc_z(3C35) = 5 _eat_pat(23C1) = 4 _do_acce(4392) = 2 _do_exit(48EA) = 2 _suspend(1EB9) = 2 _rw_supe(352A) = 2 _alloc_i(36B4) = 2 _free_in(3829) = 2 .csa2(541A) = 2 _no_call(23B2) = 1 _do_chdi(3F2E) = 1 _do_stat(4072) = 1 _do_unli(2DFE) = 1 _unload_(317E) = 1 _sys_new(4D4B) = 1
go@orstcs.cs.ORST.EDU (05/09/87)
Just a thought regarding the overhead numbers: I did a "hand job" on the profiling of the trip through the message code for clock interrupts, and it suggested to me that as much as 75% of the time spent handling a simple clock tick interrupt could be elided by some simple tests (e.g. don't validate parameters if message is coming from INTERRUPT.) I truly don't want to change the "structure" of the message passing -- it is VERY elegant the way it is and should appear "orthogonal" from the very bottom-most level. However, your numbers may be slightly tainted regarding this sendrec overhead, since you will not see the time used by the clock interrupt within that routine. Granted, most of the other uses of sendrec are small -- a few for each system call and most system calls are for disk i/o which moves data in fairly large chunks. If someone could work up the steam, they could connect their PC to a logic analyzer with code frequency tracing and get some impartial numbers. I applaud your efforts, though. I thought of doing something like that but just couldn't work up the energy. Gary Oliver ...!hplabs!hp-pcd!orstcs!go (Until they pull my plug)