jeff@fluke.UUCP (Jeff Stearns) (03/29/85)
(The following applies to 4.2BSD on a VAX....) Do you ever run lastcomm(1) or sa(8) and wonder about all those processes which consumed zero seconds of CPU time? For example: printenv tod tty22 0 secs Thu Mar 28 10:26 stty klis tty25 0 secs Thu Mar 28 10:26 sh gms tty17 0 secs Thu Mar 28 10:26 more gms tty17 0 secs Thu Mar 28 10:26 ls don tty11 0 secs Thu Mar 28 10:26 sendmail F root __ 0 secs Thu Mar 28 10:25 csh joe ttyp1 2 secs Thu Mar 28 10:25 whereis joe ttyp1 0 secs Thu Mar 28 10:25 mail S daemon __ 0 secs Thu Mar 28 10:25 mail owens tty08 2 secs Thu Mar 28 10:12 comsat F root __ 0 secs Thu Mar 28 10:25 C'mon now, even the simplest command takes *some* time: % time date Thu Mar 28 11:18:35 PST 1985 0.2u 0.3s 0:01 62% 120+24k 1+3io 4pf+0w So what's the story here? Why do I get such crummy resolution from lastcomm? Well, it turns out that the accounting file /usr/adm/acct comprises a series of records, one per process. A record contains various interesting data about the process, including user and system CPU time. These times are stored in a cute little 16-bit floating point format with a dynamic range from 0 to 4.58E6 seconds. (Luckily, I don't run that many processes which consume more than 5.3 CPU days.) But the time is recorded in *seconds*, and is TRUNCATED by the kernel (rather than rounded) when it is written. So most times recorded in the accounting file are wildly in error. A check of yesterday's accounting data shows that 10,000 out of the total 12,000 processes were recorded as using zero time! I know that DEC says a VAX is fast, but... This makes the output of sa(8) very untrustworthy. I would like to see the CPU time data recorded in a form which resolves to milliseconds at the low end of the scale. It seems to me that the designer of the current scheme went overboard with 13 bits in the mantissa and cut himself short on exponent (only 3 bits). How about using a few more bits of exponent, and recording the time in milliseconds? This would still give us a couple of decimal digits of precision - and values which are meaningful for those other 10,000 processes which slipped under the rug. -- Jeff Stearns (206) 356-5064 John Fluke Mfg. Co. P.O. Box C9090 Everett WA 98043 {uw-beaver,decvax!microsof,ucbvax!lbl-csam,allegra,ssc-vax}!fluke!jeff
long@ittvax.UUCP (H. Morrow Long [Systems Center]) (04/02/85)
We noticed this change in 4.2bsd very quickly after conversion from 4.1bsd. We had ported the System III accounting package to 4.1bsd. We noticed that under 4.2bsd many of the totals for CPU usage were coming up zero for normal users. We then were able to attribute it to the coarse time measurement being used under 4.2bsd (seconds) and the fact that many of the 'average' processes on a system never use at least one second of cpu time and therefore show up with zero time. We modified the accounting routines in sys/kern_acct.c to provide utime, stime and etime in 60ths of a second (the 4.1bsd way) rather than seconds. -- H. Morrow Long ITT-ATC Systems Center, 1 Research Drive Shelton, CT 06484 Phone #: (203)-929-7341 x. 634 path = {allegra bunker ctcgrafx dcdvaxb dcdwest ucbvax!decvax duke eosp1 ittral lbl-csam milford mit-eddie psuvax1 purdue qubix qumix research sii supai tmmnet twg uf-cgrl wxlvax yale}!ittvax!long
pete@umcp-cs.UUCP (Pete Cottrell) (04/03/85)
>>>Subject: 1-second resolution of process accounting times >>>(The following applies to 4.2BSD on a VAX....) >>>Well, it turns out that the accounting file /usr/adm/acct comprises a series >>>of records, one per process. A record contains various interesting data >>>about the process, including user and system CPU time. These times are >>>stored in a cute little 16-bit floating point format with a dynamic range >>>from 0 to 4.58E6 seconds. (Luckily, I don't run that many processes which >>>consume more than 5.3 CPU days.) >>>But the time is recorded in *seconds*, and is TRUNCATED by the kernel (rather >>>than rounded) when it is written. The range of the floating point format is a lot higher. See below. >>>So most times recorded in the accounting file are wildly in error. A check >>>of yesterday's accounting data shows that 10,000 out of the total 12,000 >>>processes were recorded as using zero time! I know that DEC says a VAX is >>>fast, but... Yeah, I've found that about 87% to 93% of all commands have either 0 system or user CPU seconds, and about 75% to 85% of all commands have NO time reported at all. These numbers are derived from our systems, which are an 11/780 and an 11/750. I've found that only about 75% to 80% of CPU time is reported if you report in seconds, as described above. >>>I would like to see the CPU time data recorded in a form which resolves to >>>milliseconds at the low end of the scale. It seems to me that the designer >>>of the current scheme went overboard with 13 bits in the mantissa and cut >>>himself short on exponent (only 3 bits). How about using a few more bits of >>>exponent, and recording the time in milliseconds? This would still give us a >>>couple of decimal digits of precision - and values which are meaningful for >>>those other 10,000 processes which slipped under the rug. The present format should work fine; the 13 bit mantissa gives you 8192, and the 3 bit exponent lets you left-shift this 3 places 7 times. So, the largest number representable is larger than what's in the long variable that's handed to the compress routine, and even in milliseconds, that is a long time (quick calculation yields over 49 days). Make the following change to kern_acct.c and change your accounting programs, and you're in business. 93,94c93,96 < ap->ac_utime = compress((long)u.u_ru.ru_utime.tv_sec); < ap->ac_stime = compress((long)u.u_ru.ru_stime.tv_sec); --- > ap->ac_utime = compress((long)(u.u_ru.ru_utime.tv_sec * 1000 + > (u.u_ru.ru_utime.tv_usec / 1000))); > ap->ac_stime = compress((long)(u.u_ru.ru_stime.tv_sec * 1000 + > (u.u_ru.ru_stime.tv_usec / 1000))); -- Call-Me: Pete Cottrell, Univ. of Md. Comp. Sci. Dept. UUCP: {seismo,allegra,brl-bmd}!umcp-cs!pete CSNet: pete@umcp-cs ARPA: pete@maryland