mhoffman@infocenter.UUCP (Mike Hoffman) (06/20/89)
In an earlier message, I wrote: >I have an application in which I need to check to see if a process >is currently active. Thanks to all who answered my request for a better approach. The correct way to tell if a process is active is by using kill(2), and sending a signal of 0. This will perform error checking, but does not actually send a signal to the process. Kill(2) returns 0 if the process exists and is yours, or -1 otherwise. If -1 is returned, errno is set to any of several values, the most relevant of which are EPERM (the process exists but is not yours) and ESRCH (the process does not exist). In one response, Casper H.S. Dik (casper@fwi.uva.nl) points out that "As often is the case with Unix manuals, if you know where to look you get perfect answers." For me, kill wasn't a very obvious place to look for process information! Thanks to the following people for responding to my request (and for providing consistent answers - this turned out to be much simpler than finding the meaning of "grep" :-) casper@fwi.uva.nl uunet!prcrs!paul uunet!atexnet!jackal ram@cuxlm.att.com uunet!mcnc!unc!poirier uunet!noifcrf.gov!kml jeff@quark.wv.tek.com peter@ficc.uu.net uunet!arizona!sham --- Michael J. Hoffman "My opinions are my own and are Manufacturing Engineering not to be employed with those Encore Computer Corporation of my confuser." UUCP: {uunet,codas!novavax,sun,pur-ee}!gould!mhoffman
pim@ctisbv.UUCP (Pim Zandbergen) (06/21/89)
In article <2848@infocenter.UUCP> mhoffman@infocenter.UUCP (Mike Hoffman) writes: > > >Thanks to all who answered my request for a better approach. The >correct way to tell if a process is active is by using kill(2), >and sending a signal of 0. This will perform error checking, but >does not actually send a signal to the process. > I have a question that is related this one. Our applications use the same style of lock-files as are used in uucp: when a resource is claimed, a lockfile is created with a name that reflects the claimed resource and a content that holds the pid of the resource claiming process, so other processes can check the validity of the claim by examing if the process still is alive. But as our application is mainly turnkey based, I have seen more then once that checking the pid only is not enough. Our customers turn on the machine, and go right away into the application. At that time a resource is being claimed. Then there is a system crash, the system is rebooted, and the application is restarteds, AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds the lockfile, it checks for its pid and finds out it exists, and fails to claim the resource. The second time the application is started it will continue without failure. So I am looking for some way to put some extra information into the lockfile to find out if the machine has been rebooted since the resource claim. What is the most obvious and portable way to do this? Thanks for any responses. Pim. -- --------------------+----------------------+----------------------------------- Pim Zandbergen | phone: +31 70 542302 | CTI Software BV pim@ctisbv.UUCP | fax : +31 70 512837 | Laan Copes van Cattenburch 70 ...!uunet!mcvax!hp4nl!ctisbv!pim | 2585 GD The Hague, The Netherlands
davidsen@sungod.crd.ge.com (William Davidsen) (06/22/89)
In article <763@ctisbv.UUCP> pim@ctisbv.UUCP (Pim Zandbergen) writes: | But as our application is mainly turnkey based, I have seen more | then once that checking the pid only is not enough. Our customers | turn on the machine, and go right away into the application. | At that time a resource is being claimed. Then there is a system crash, | the system is rebooted, and the application is restarteds, | AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds | the lockfile, it checks for its pid and finds out it exists, | and fails to claim the resource. The second time the application | is started it will continue without failure. | | So I am looking for some way to put some extra information into | the lockfile to find out if the machine has been rebooted | since the resource claim. What is the most obvious and portable | way to do this? If I understand what you're trying to do, you can't solve the problem in the application. My first thought was: while NOT got_resource if open_file == OKAY read PID form file if PID == my_PID got_resource else signal zero to PID if no_process got_resource else { your favorite wait logic here, or terminate } fi fi else got_resource fi wend create lockfile write my_PID In addition to the possible race conditions present with lockfile use in general, this doesn't catch the case where the system is restarted and the stale PID is that of a valid process which doesn't have the resource. In that case you won't detect the problem in the process trying to get the resource. My suggestion is to fix your startup logic to eliminate the lockfiles in the first place. Then the whole problem falls out. Sorry I don't have a better idea. My startup has a list of things to "rm -f" before going multiuser. bill davidsen (davidsen@crdos1.crd.GE.COM) {uunet | philabs}!crdgw1!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
frank@rsoft.bc.ca (Frank I. Reiter) (06/22/89)
In article <763@ctisbv.UUCP> pim@ctisbv.UUCP (Pim Zandbergen) writes: >Then there is a system crash, >the system is rebooted, and the application is restarteds, >AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds >the lockfile, it checks for its pid and finds out it exists, >So I am looking for some way to put some extra information into >the lockfile to find out if the machine has been rebooted >since the resource claim. Have your startup code do something like "touch /etc/startup-file" . Now your applications can compare the modification date on this file to the modification date on your lock files. A better alternative (IMHO) is to have a cleanup script in your startup code which deletes any extraneous lock files. This eliminates the need to check dates at run time. -- _____________________________________________________________________________ Frank I. Reiter UUCP: {uunet,ubc-cs}!van-bc!rsoft!frank Reiter Software Inc. frank@rsoft.bc.ca, a2@mindlink.UUCP Langley, British Columbia BBS: Mind Link @ (604)533-2312, login as Guest
dg@lakart.UUCP (David Goodenough) (06/23/89)
From article <763@ctisbv.UUCP>, by pim@ctisbv.UUCP (Pim Zandbergen): ] But as our application is mainly turnkey based, I have seen more ] then once that checking the pid only is not enough. Our customers ] turn on the machine, and go right away into the application. ] At that time a resource is being claimed. Then there is a system crash, ] the system is rebooted, and the application is restarteds, ] AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds ] the lockfile, it checks for its pid and finds out it exists, ] and fails to claim the resource. The second time the application ] is started it will continue without failure. ] ] So I am looking for some way to put some extra information into ] the lockfile to find out if the machine has been rebooted ] since the resource claim. What is the most obvious and portable ] way to do this? Why not just give the lock files a generic name - /usr/spool/lock/XXresource or somesuch. Now do a: rm -f /usr/spool/lock/XX* in your /etc/rc (or /etc/rc.local if you have a civilized system) and you're all set: the lockfiles all vanish every time the system comes up. -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+
mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/29/89)
In article <763@ctisbv.UUCP>, pim@ctisbv.UUCP (Pim Zandbergen) writes: > the lockfile to find out if the machine has been rebooted > since the resource claim. What is the most obvious and portable > way to do this? > > Thanks for any responses. > Pim. The most obvious way I can think of is to execute "who -b" which prints the last boot time of the machine. If this has changed, the machine has been rebooted. Mike Lindner attunix!mpl AT&T Bell Laboratories 190 River Rd. Summit, NJ 07901
mhoffman@infocenter.UUCP (Mike Hoffman) (07/01/89)
in article <763@ctisbv.UUCP>, pim@ctisbv.UUCP (Pim Zandbergen) says: > > But as our application is mainly turnkey based, I have seen more > then once that checking the pid only is not enough. Our customers > turn on the machine, and go right away into the application. > At that time a resource is being claimed. Then there is a system crash, > the system is rebooted, and the application is restarteds, > AND IS RUNNING WITH THE EXACT SAME PID! Hence, when it finds > the lockfile, it checks for its pid and finds out it exists, > and fails to claim the resource. The second time the application > is started it will continue without failure. This is essentially the same as my application, which provoked my original question. The lockfiles I use, however, are monitored by a daemon process, started by /etc/rc.local at boot time. The first thing my daemon process does is "cleandir()" - remove all lockfiles in the given directory. After that, any processes that start up do so with a clean slate. I a daemon won't suffice, how about a simple shellscript run from /etc/rc.local that cleans up the directories before going multi- user?