[comp.unix.questions] Summary - How to tell if a process is active

mhoffman@infocenter.UUCP (Mike Hoffman) (06/20/89)

In an earlier message, I wrote:
>I have an application in which I need to check to see if a process
>is currently active.

Thanks to all who answered my request for a better approach. The
correct way to tell if a process is active is by using kill(2),
and sending a signal of 0. This will perform error checking, but
does not actually send a signal to the process.

Kill(2) returns 0 if the process exists and is yours, or -1 
otherwise. If -1 is returned, errno is set to any of several
values, the most relevant of which are EPERM (the process exists
but is not yours) and ESRCH (the process does not exist).

In one response, Casper H.S. Dik (casper@fwi.uva.nl) points out
that "As often is the case with Unix manuals, if you know where
to look you get perfect answers." For me, kill wasn't a very
obvious place to look for process information!

Thanks to the following people for responding to my request (and
for providing consistent answers - this turned out to be much
simpler than finding the meaning of "grep" :-)

    casper@fwi.uva.nl
    uunet!prcrs!paul
    uunet!atexnet!jackal 
    ram@cuxlm.att.com   
    uunet!mcnc!unc!poirier
    uunet!noifcrf.gov!kml
    jeff@quark.wv.tek.com
    peter@ficc.uu.net
    uunet!arizona!sham

---
Michael J. Hoffman                   "My opinions are my own and are
Manufacturing Engineering             not to be employed with those
Encore Computer Corporation           of my confuser."
                            
UUCP: {uunet,codas!novavax,sun,pur-ee}!gould!mhoffman

pim@ctisbv.UUCP (Pim Zandbergen) (06/21/89)

In article <2848@infocenter.UUCP> mhoffman@infocenter.UUCP (Mike Hoffman) writes:
>
>
>Thanks to all who answered my request for a better approach. The
>correct way to tell if a process is active is by using kill(2),
>and sending a signal of 0. This will perform error checking, but
>does not actually send a signal to the process.
>

I have a question that is related this one. Our applications
use the same style of lock-files as are used in uucp:
when a resource is claimed, a lockfile is created with a name
that reflects the claimed resource and a content that holds
the pid of the resource claiming process, so other processes
can check the validity of the claim by examing if the process
still is alive.

But as our application is mainly turnkey based, I have seen more
then once that checking the pid only is not enough. Our customers
turn on the machine, and go right away into the application.
At that time a resource is being claimed. Then there is a system crash,
the system is rebooted, and the application is restarteds,
AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds
the lockfile, it checks for its pid and finds out it exists,
and fails to claim the resource. The second time the application
is started it will continue without failure.

So I am looking for some way to put some extra information into
the lockfile to find out if the machine has been rebooted
since the resource claim. What is the most obvious and portable
way to do this?

Thanks for any responses.
Pim.
-- 
--------------------+----------------------+-----------------------------------
Pim Zandbergen      | phone: +31 70 542302 | CTI Software BV
pim@ctisbv.UUCP     | fax  : +31 70 512837 | Laan Copes van Cattenburch 70
...!uunet!mcvax!hp4nl!ctisbv!pim           | 2585 GD The Hague, The Netherlands

davidsen@sungod.crd.ge.com (William Davidsen) (06/22/89)

In article <763@ctisbv.UUCP> pim@ctisbv.UUCP (Pim Zandbergen) writes:

| But as our application is mainly turnkey based, I have seen more
| then once that checking the pid only is not enough. Our customers
| turn on the machine, and go right away into the application.
| At that time a resource is being claimed. Then there is a system crash,
| the system is rebooted, and the application is restarteds,
| AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds
| the lockfile, it checks for its pid and finds out it exists,
| and fails to claim the resource. The second time the application
| is started it will continue without failure.
| 
| So I am looking for some way to put some extra information into
| the lockfile to find out if the machine has been rebooted
| since the resource claim. What is the most obvious and portable
| way to do this?

If I understand what you're trying to do, you can't solve the problem in
the application. My first thought was:
	while NOT got_resource
	  if open_file == OKAY
	    read PID form file
	    if PID == my_PID got_resource
	    else
	      signal zero to PID
	      if no_process got_resource
	      else
	        { your favorite wait logic here, or terminate }
	      fi
	    fi
	  else
	    got_resource
	  fi
	wend
	create lockfile
	write my_PID

In addition to the possible race conditions present with lockfile use in
general, this doesn't catch the case where the system is restarted and
the stale PID is that of a valid process which doesn't have the
resource. In that case you won't detect the problem in the process
trying to get the resource.

My suggestion is to fix your startup logic to eliminate the lockfiles in
the first place. Then the whole problem falls out.

Sorry I don't have a better idea. My startup has a list of things to
"rm -f" before going multiuser.
	bill davidsen		(davidsen@crdos1.crd.GE.COM)
  {uunet | philabs}!crdgw1!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

frank@rsoft.bc.ca (Frank I. Reiter) (06/22/89)

In article <763@ctisbv.UUCP> pim@ctisbv.UUCP (Pim Zandbergen) writes:
>Then there is a system crash,
>the system is rebooted, and the application is restarteds,
>AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds
>the lockfile, it checks for its pid and finds out it exists,

>So I am looking for some way to put some extra information into
>the lockfile to find out if the machine has been rebooted
>since the resource claim.

Have your startup code do something like "touch /etc/startup-file" .
Now your applications can compare the modification date on this
file to the modification date on your lock files.

A better alternative (IMHO) is to have a cleanup script in your startup
code which deletes any extraneous lock files.  This eliminates the need
to check dates at run time.


-- 
_____________________________________________________________________________
Frank I. Reiter              UUCP:  {uunet,ubc-cs}!van-bc!rsoft!frank
Reiter Software Inc.                frank@rsoft.bc.ca,  a2@mindlink.UUCP
Langley, British Columbia     BBS:  Mind Link @ (604)533-2312, login as Guest

dg@lakart.UUCP (David Goodenough) (06/23/89)

From article <763@ctisbv.UUCP>, by pim@ctisbv.UUCP (Pim Zandbergen):
] But as our application is mainly turnkey based, I have seen more
] then once that checking the pid only is not enough. Our customers
] turn on the machine, and go right away into the application.
] At that time a resource is being claimed. Then there is a system crash,
] the system is rebooted, and the application is restarteds,
] AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds
] the lockfile, it checks for its pid and finds out it exists,
] and fails to claim the resource. The second time the application
] is started it will continue without failure.
] 
] So I am looking for some way to put some extra information into
] the lockfile to find out if the machine has been rebooted
] since the resource claim. What is the most obvious and portable
] way to do this?

Why not just give the lock files a generic name -

/usr/spool/lock/XXresource

or somesuch. Now do a:

rm -f /usr/spool/lock/XX*

in your /etc/rc (or /etc/rc.local if you have a civilized system)
and you're all set: the lockfiles all vanish every time the system
comes up.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com		  	  +---+

mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/29/89)

In article <763@ctisbv.UUCP>, pim@ctisbv.UUCP (Pim Zandbergen) writes:
> the lockfile to find out if the machine has been rebooted
> since the resource claim. What is the most obvious and portable
> way to do this?
> 
> Thanks for any responses.
> Pim.

The most obvious way I can think of is to execute "who -b" which prints the
last boot time of the machine.  If this has changed, the machine has been
rebooted.

Mike Lindner
attunix!mpl
AT&T Bell Laboratories
190 River Rd.
Summit, NJ 07901

mhoffman@infocenter.UUCP (Mike Hoffman) (07/01/89)

in article <763@ctisbv.UUCP>, pim@ctisbv.UUCP (Pim Zandbergen) says:
> 
> But as our application is mainly turnkey based, I have seen more
> then once that checking the pid only is not enough. Our customers
> turn on the machine, and go right away into the application.
> At that time a resource is being claimed. Then there is a system crash,
> the system is rebooted, and the application is restarteds,
> AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds
> the lockfile, it checks for its pid and finds out it exists,
> and fails to claim the resource. The second time the application
> is started it will continue without failure.

This is essentially the same as my application, which provoked my
original question. The lockfiles I use, however, are monitored
by a daemon process, started by /etc/rc.local at boot time. The first
thing my daemon process does is "cleandir()" - remove all lockfiles
in the given directory.

After that, any processes that start up do so with a clean slate. 

I a daemon won't suffice, how about a simple shellscript run from
/etc/rc.local that cleans up the directories before going multi-
user?