[comp.os.minix] MINIX kernel profiling

n0ano@wldrdg.UUCP (05/06/87)

After getting MINIX up and running on my PC clone I decided to find out
where the kernel was spending its time.  What I did was modify the clock
interrupt handler to update a counter for every value of the instruction
pointer when the clock interrupt occurred.  (Actually, there wasn't enough
data space in the kernel so counters keep track of every 16 instruction
bytes - that granularity should be reasonable.)  I then created a user
program that reads the counters from `/dev/kmem' and correlates them with
the kernel's load maps to show a profile of where the kernel is spending
its time.

I then profiled the kernel while re-compiling and loading all of the files
in the kernel directory.  This task took about 45 minutes on my Zenith 150,
4.7 MHz 8088, 704K RAM, 10 Mbyte Winchester.  The listing at the end of
this article shows how many clock ticks (1 tick = 16.6 msec) were spent
in either user mode or each of the routines in the kernel, file system or
memory manager.

A few observations about the results:

1.  If you add up all the ticks you get the following breakdown:

	User	142810
	kernel	 10192
	fs	 13286
	mm	  4532

	Total	170820

This means that MINIX (kernel + fs + mm) only used 28010 out of
170820 ticks or about 16% of the CPU.  Considering that MINIX is
a message passing kernel this isn't a bad overhead figure.

2.  The time in `idle' was probably time spent waiting for disk I/O
to complete.  I didn't measure how many disk requests were made to
the winchester so I don't know if this is a reasonable amount of time.

3.  The profile for the kernel itself is not as acurate as it should
be.  Unfortunately, MINIX spends a lot of time with all interrupts
disabled.  This is why so little time is shown in `phys_copy' or
`cp_mess'.  One of the things I want to do is change the kernel's
lock/unlock mechanism to use the interrupt mask in the PIC rather than
the interrupt enable bit in the flag register.  This way I should be
able to get better profiling of the kernel itself.  Since the file
system and memory manager run as tasks their profiles should be reasonably
accurate.

4.  One promising note is that there doesn't appear to be any major CPU
hogs in the kernel.  The biggest routine, sendrec, really doesn't take
up that much time.  It probably should be optimized if at all possible
but you won't gain much if you do that.

Don Dugger
Wildridge Consulting
...nbires!wldrdg!n0ano

user ticks     = 142810

       kernel                 file system               memory manager

    idle( 140) = 5992     _sendrec(4F4B) = 6663     _sendrec(2668) = 3941     
_sendrec(5ED2) = 2009     _zero_bl(1C57) = 2320     _do_exec( BCC) = 210      
_winches(44F9) = 1034     _read_wr( CC3) = 747       _adjust( 845) = 65       
_mem_tas(364C) = 418      _get_blo(390B) = 283      _size_ok( AAB) = 31       
   _umap(168A) = 149      _put_blo(3AEB) = 267         _main(  1B) = 31       
   .dvi4(63E1) = 93       _read_ma(13DD) = 257      _mem_cop(21A9) = 29       
   .csb2(63A5) = 72       _alloc_b(3226) = 197      _do_fork( 358) = 24       
   .dvu4(64B4) = 59       _get_sup(3484) = 197       _callm1(2558) = 23       
    .cuu(63BC) = 53          .dvi4(5474) = 193         _send(265E) = 20       
   .rmi4(6522) = 51           .cuu(544F) = 156       _callm3(2594) = 18       
_clock_t(324E) = 48          _main(  1B) = 147       _do_brk( 7CD) = 18       
   .rmu4(65A1) = 43       _search_(2675) = 133      _sys_new(2468) = 17       
_cp_mess(5904) = 34          .csb2(5438) = 112        _reply(  EA) = 13       
_vid_cop(59F0) = 21       _get_ino(35AF) = 108        _callx(25F2) = 12       
_port_ou(5933) = 21       _find_fi(3EE1) = 107      _do_wait( 626) = 10       
_keyboar(284B) = 18       _release(1F0E) = 102      _do_ksig(154A) = 9        
_receive(5ECD) = 13       _cmp_str(4A64) = 95       _sys_cop(2489) = 8        
   .csa2(6387) = 13          _copy(4A9A) = 93       _sys_xit(23B2) = 5        
_set_nam(5DF4) = 13          .rmi4(5558) = 79          .cmu4(2B1D) = 5        
_port_in(5943) = 12       _clock_t(4A00) = 77       _free_me(1E58) = 5        
_get_tot(6361) = 10       _last_di(240A) = 74       _tell_fs(2536) = 4        
   .mli4(6511) = 7        _rw_user(1637) = 72       _alloc_m(1DDB) = 4        
 _inform(15DF) = 5        _rw_bloc(3D4E) = 64           _len(2637) = 4        
_portw_i(5967) = 3        _free_bi(33C5) = 57       _sys_get(23D3) = 3        
_tty_tas(178C) = 1         _dev_io(2164) = 50       _get_tot(2AF7) = 3        
                          _do_ioct(21EB) = 50          .csb2(2B37) = 3        
                          _scale_f(350F) = 49       _do_sign(140A) = 2        
                             .mli4(5547) = 49       _sys_exe(2447) = 2        
                          _get_fil(3EB4) = 44       _max_hol(1FD7) = 2        
                          _do_fork(4830) = 29       _allowed(205F) = 2        
                          _sys_cop(4D6C) = 28           .cuu(2B4E) = 2        
                          _pipe_ch(1DB0) = 27          _read(2365) = 2        
                           _get_fd(3E29) = 25       _sys_for(2424) = 1        
                          _do_sync(4715) = 25       _do_paus(1915) = 1        
                           _rw_dev(22FF) = 25         _close(232B) = 1        
                          _read_ah(16BD) = 24       _sig_pro(177B) = 1        
                          _rw_inod(384A) = 21          _open(234A) = 1        
                          _dev_clo(213D) = 21                                 
                            _reply( 17B) = 20                                 
                          _dup_ino(38F5) = 18                                 
                          _forbidd(43F6) = 15                                 
                              _len(4F1A) = 14                                 
                          _do_writ(1713) = 12                                 
                          _do_fsta(40CC) = 11                                 
                          _fetch_n(4B33) = 11                                 
                          _put_ino(3648) = 11                                 
                          _advance(2545) = 10                                 
                          _new_blo(1AF8) = 10                                 
                          _do_lsee( BE0) = 10                                 
                          _do_read( CB5) = 8                                  
                          _do_clos( AE3) = 7                                  
                          _truncat(2EDF) = 7                                  
                          _free_zo(3D1A) = 6                                  
                          _do_open( 99C) = 5                                  
                          _mounted(34C9) = 5                                  
                           _revive(1F98) = 5                                  
                          _wipe_in(37E2) = 5                                  
                          _alloc_z(3C35) = 5                                  
                          _eat_pat(23C1) = 4                                  
                          _do_acce(4392) = 2                                  
                          _do_exit(48EA) = 2                                  
                          _suspend(1EB9) = 2                                  
                          _rw_supe(352A) = 2                                  
                          _alloc_i(36B4) = 2                                  
                          _free_in(3829) = 2                                  
                             .csa2(541A) = 2                                  
                          _no_call(23B2) = 1                                  
                          _do_chdi(3F2E) = 1                                  
                          _do_stat(4072) = 1                                  
                          _do_unli(2DFE) = 1                                  
                          _unload_(317E) = 1                                  
                          _sys_new(4D4B) = 1

go@orstcs.cs.ORST.EDU (05/09/87)

Just a thought regarding the overhead numbers:  I did a "hand job" on the
profiling of the trip through the message code for clock interrupts, and
it suggested to me that as much as 75% of the time spent handling a
simple clock tick interrupt could be elided by some simple tests
(e.g. don't validate parameters if message is coming from INTERRUPT.)
I truly don't want to change the "structure" of the message passing --
it is VERY elegant the way it is and should appear "orthogonal" from
the very bottom-most level.

However, your numbers may be slightly tainted regarding this sendrec
overhead, since you will not see the time used by the clock interrupt
within that routine.  Granted, most of the other uses of sendrec are
small --  a few for each system call and most system calls are for disk
i/o which moves data in fairly large chunks.  If someone could work up
the steam, they could connect their PC to a logic analyzer with code
frequency tracing and get some impartial numbers.

I applaud your efforts, though.  I thought of doing something like that
but just couldn't work up the energy.
Gary Oliver
...!hplabs!hp-pcd!orstcs!go		(Until they pull my plug)