[comp.unix.programmer] deamon help

bhoughto@hopi.intel.com (Blair P. Houghton) (02/06/91)

In article <BZS.91Feb5105600@world.std.com> bzs@world.std.com (Barry Shein) writes:
>Just write a shell script loop which uses awk to suck out the TIME
>field of the procs as reported by ps and see if it's changing or not.

Gack.  Scripts for daemons.  Barry, you should be ashamed. :-)

Write it in C and use popen(3).  (Use YACC for the parsing,
if you want it done to a crisp).  (Uh-oh.  My BSD is
showing.  Does SysV know popen?)

A stronger reason to use C is that ps(1) often runs fields
together; it can also munge columnar justification.  You'll
have to detect and work around those situations.  (If you
can characterize the various forms of run-together fields,
lex(1) can tokenize them together and then yacc(1) can parse
them apart).

For example (output of "ps alx | egrep 'TIME|foo'"):

      F UID   PID  PPID CP PRI NI ADDR  SZ  RSS WCHAN STAT TT  TIME COMMAND
94080012010 11236 11235255  96  41eba211194 9660       R N  pa143:58  (foo)
b0000013092 17289   244  7   1  017232  41   21 be598 S    10  0:00 egrep TIME|

PID 17829 is fine.  PID 11236 is a mess.  awk(1) and
perl(1) would barf up gnodes at this.  (Those of you who
smell challenge, smell aright.)

>For a problem like this I'll bet you a nickel crafting the whole thing
>in C using /dev/kmem etc will not be much faster than the above
>described script, and will take a week to get right instead of 30 minutes.

Better, use popen(3) and top(1).  Top usually gets the data
much faster than ps.  Why?  Who knows?  Could be anything
from superior skills among public-domain software developers
to abuse of /dev/null.

				--Blair
				  "My every keyword is copyright
				   someone else, these days..."

emv@ox.com (Ed Vielmetti) (02/06/91)

In article <2304@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:

   For example (output of "ps alx | egrep 'TIME|foo'"):

	 F UID   PID  PPID CP PRI NI ADDR  SZ  RSS WCHAN STAT TT  TIME COMMAND
   94080012010 11236 11235255  96  41eba211194 9660       R N  pa143:58  (foo)
   b0000013092 17289   244  7   1  017232  41   21 be598 S    10  0:00 egrep TIME|

   PID 17829 is fine.  PID 11236 is a mess.  awk(1) and
   perl(1) would barf up gnodes at this.  (Those of you who
   smell challenge, smell aright.)

awk and sed (and perl for that matter) would do OK as long as they
didn't assume that whitespace was a field delimiter; break on absolute
columns with substr() or unpack().  that's not to say that ps doesn't
have an interesting idea of how to jam fields together...

If portability is not an issue I'd stick something into top (ps,sps
etc) to print out the fields you want in a nice, tagged, easy to parse
format.

--Ed
emv@ox.com

ps. c vs. awk vs. perl isn't a wizards issue at all.  hacking ps is.

jfh@rpp386.cactus.org (John F Haugh II) (02/06/91)

In article <2304@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>In article <BZS.91Feb5105600@world.std.com> bzs@world.std.com (Barry Shein) writes:
>>For a problem like this I'll bet you a nickel crafting the whole thing
>>in C using /dev/kmem etc will not be much faster than the above
>>described script, and will take a week to get right instead of 30 minutes.
>
>Better, use popen(3) and top(1).  Top usually gets the data
>much faster than ps.  Why?  Who knows?  Could be anything
>from superior skills among public-domain software developers
>to abuse of /dev/null.

Hmmm.  I'm half tempted to take that bet.  One problem I envision with
the PS approach is that the CPU resolution is to the full second, and
there are many processes which lurk about in the background and don't
use much more than a second in a days time.  Here's the PS output from
this system with those processes selected -

    UID   PID  PPID  C    STIME TTY  TIME COMMAND
   root    30     1  0  Jan 24    ?  0:00 /etc/syslogd /usr/adm/syslog 
     lp    35     1  0  Jan 24    ?  0:00 /usr/lib/lpsched 

I'd wager that it is fairly easy to write a program that would do
a considerable amount of work and never record a single CPU tick.

PER PROCESS USER AREA:
USER ID's:	uid: 0, gid: 0, real uid: 0, real gid: 0
PROCESS TIMES:	user: 6, sys: 17, child user: 0, child sys: 0
PROCESS MISC:	proc slot: 8, cntrl tty: maj(4) min(2)
IPC:		locks: unlocked
FILE I/O:	user addr: 25708343, file offset: 357, bytes: 0,
		segment: user, umask: 26, ulimit: 2097152
ACCOUNTING:	command: syslogd, memory: 931552, type: fork
		start: Thu Jan 24 19:30:02 1991
OPEN FILES:	file desc:     0   1   2   3
		file slot:     9   9   9   8

This is most of the user structure for the "syslogd" process which I
PS'd above.  It has logged 23 ticks in 13 days of running, yet it
at a minimum produces a log record once an hour.  You can't log a
fraction of a tick, so it would appear there is about a 1 in 13 chance
of the hourly timestamp logging a tick.  The implication of this is
that a process which does work (even "does work every hour") may still
log clock ticks so slowly that even after a month of furious light
activity (interesting use of the word "furious" ;-), it still has yet
to log a single full second of time.
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.

bzs@world.std.com (Barry Shein) (02/07/91)

From: jfh@rpp386.cactus.org (John F Haugh II)
>Hmmm.  I'm half tempted to take that bet.  One problem I envision with
>the PS approach is that the CPU resolution is to the full second, and
>there are many processes which lurk about in the background and don't
>use much more than a second in a days time.

That's a good point (see, you can tell the "wizards", they're the ones
willing to admit they may be wrong...:-)

Looks like we need another option to ps...to increase clock res.

Seriously, grokking around in the kernel proc structures tends to be
fraught with peril unless you're really pre-disposed to that sort of
thing. The next best suggestion would be to try to find a reliable
program distributed with source to just modify for this task.

"Top" comes to mind, modifying top to do this, or even just rip out
the full-screenness (there's a flag for this, -b) and modify the
print-out to include more precision and then revert to the script
idea.

Whatever, at that point the rest of the code is probably pretty
simple, just sleep, sweep an array, and kill if desired (how easy this
is depends on how top is structured internally.)
-- 
        -Barry Shein

Software Tool & Die    | bzs@world.std.com          | uunet!world!bzs
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

jfh@rpp386.cactus.org (John F Haugh II) (02/07/91)

In article <BZS.91Feb6211712@world.std.com> bzs@world.std.com (Barry Shein) writes:
>Seriously, grokking around in the kernel proc structures tends to be
>fraught with peril unless you're really pre-disposed to that sort of
>thing.

Hmmm ;-)

>        The next best suggestion would be to try to find a reliable
>program distributed with source to just modify for this task.

I posted a "crash" thing to alt.sources some time ago (or was it
comp.sources.misc?  I forget ...)  If enough people are interested,
I'll repost.  It works on my SCO XENIX box, but that's all I can
say ...
-- 
John F. Haugh II                             UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 832-8832                           Domain: jfh@rpp386.cactus.org
"I've never written a device driver, but I have written a device driver manual"
                -- Robert Hartman, IDE Corp.

scott@convergent.com (Scott Lurndal) (02/08/91)

|> In article <2304@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
|> >In article <BZS.91Feb5105600@world.std.com> bzs@world.std.com (Barry Shein) writes:
|> >>For a problem like this I'll bet you a nickel crafting the whole thing
|> >>in C using /dev/kmem etc will not be much faster than the above
|> >>described script, and will take a week to get right instead of 30 minutes.
|> >
|> >Better, use popen(3) and top(1).  Top usually gets the data
|> >much faster than ps.  Why?  Who knows?  Could be anything
|> >from superior skills among public-domain software developers
|> >to abuse of /dev/null.
|> 

Actually the best solution would be to use the /proc file system code if you 
have a SVR4.0 system.   Use opendir(3)/readdir(3) to fetch each process number, 
open it, issue a PIOCPSINFO ioctl(2), and close it.   The PIOCPSINFO ioctl(2) will
return the system and user times in seconds and nanoseconds (amongst other things).  
This should be sufficient resolution to determine whether the process 
is really doing anything.

Be aware that although the resolution of the field is in nanoseconds, 
some systems may only support micro or milli-second resolution.

Scott.

torek@elf.ee.lbl.gov (Chris Torek) (02/18/91)

>From: jfh@rpp386.cactus.org (John F Haugh II)
>>... One problem I envision with the PS approach is that the CPU
>>resolution is to the full second, and there are many processes which
>>lurk about in the background and don't use much more than a second
>>in a days time.

In article <BZS.91Feb6211712@world.std.com> bzs@world.std.com
(Barry Shein) writes:
>Looks like we need another option to ps...to increase clock res.

4.3reno++ (4.3785?) already gives time to 10 ms resolution:

% ps u
USER       PID %CPU %MEM   VSZ  RSS TT  STAT STARTED       TIME COMMAND
torek     9409 13.3  0.7   128   55 q1  R+    2:28AM    0:00.27 ps u
torek     9339  0.2  0.7   150   37 q1  Ss    1:10AM    0:01.85 -csh (csh)

>Seriously, grokking around in the kernel proc structures tends to be
>fraught with peril unless you're really pre-disposed to that sort of
>thing.

Actually, there is a more basic problem, at least under 4BSD.  It is
not difficult to write a program that uses 80 to 90 % of the total
available CPU time yet shows 0% CPU usage.  This is due to a phase
interaction between the scheduler and the process accounting code.
They both run off the same clock, and you can rig things so that your
process is not running when the next clock interrupt fires.

I will not give out the details here, since defeating the accounting
code causes the scheduler to think that your process is being bullied
and thus it gets higher priority than all the others, allowing it to
continue in its evil ways.  In other words, this goofs up the usual
resource sharing, and someone could use it to hog all the cycles.

(Fortunately, if someone *does*, and two people do it, the two processes
wind up exposing each other's tricks.  The clever bad guy will find a
way around this as well [such a way does exist].)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

sears@cello.hpl.hp.com (Bart Sears) (02/22/91)

Anyone who is interested in the various problems one can run into
using statistical timing and ways to get around some of the problems of
statistical timing  might want to take a look at the proceedings of
last year's USENIX Mach workshop.  David Black presented a paper on
"The Mach Timing Facility: An Implementation of Accurate Low-Overhead
Usage Timing" where he described a timing facility using timestamps
instead of statistical timing.  While it was implemented in Mach, it
was not really tied to any Mach features and would probably be fairly
easy to implement in any flavor of Unix.  This facility would take
care of the problem Chris Torek pointed out where under many current
systems it is possible to write a program which uses 80+% of the
system yet is charged for 0% CPU usage.

					Bart Sears
					sears@hplabs.hpl.hp.com