jfh@rpp386.UUCP (The Beach Bum) (08/22/88)
Someone was screaming about how unrealiable IPC (such as shared memory) on SCO Xenix was. I whipped this program up originally back during the great volatile debate and only discovered it again tonigh while cleaning out my home directory. When run it prints out TICK ... ... TOCK forever as each process gets a chance to execute. The code is short enough that you should be able to understand what is going on. If you can run this without any trouble then your shared memory is working just fine. Otherwise, you have troubles ... - John. ---------------------- clip out and save as volatile.c ---------------- #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> #include <signal.h> int zero = 0; int *loc = &zero; int key = ('v' << 8) | 'o'; catch (sig) int sig; { signal (sig, catch); } parent () { while (1) { while (*loc) ; write (1, "TICK ....\n", 10); *loc = 1; kill (loc[2], SIGUSR1); pause (); } } child () { while (1) { while (! *loc) ; write (1, ".... TOCK\n", 10); *loc = 0; kill (loc[1], SIGUSR1); pause (); } } main () { int id; if ((id = shmget (key, 3 * sizeof (int), IPC_CREAT|0666)) == -1) { perror ("shmget"); exit (1); } if ((loc = (int *) shmat (id, (char *) 0, 0)) == (int *) 0) { perror ("shmat"); exit (1); } loc[0] = 0; switch (fork ()) { default: loc[1] = getpid (); signal (SIGUSR1, catch); parent (); case 0: loc[2] = getpid (); signal (SIGUSR1, catch); child (); case -1: perror ("fork"); exit (1); } exit (1); } -- John F. Haugh II +--------- Cute Chocolate Quote --------- HASA, "S" Division | "USENET should not be confused with UUCP: killer!rpp386!jfh | something that matters, like CHOCOLATE" DOMAIN: jfh@rpp386.uucp | -- apologizes to Dennis O'Connor
richard@neabbs.UUCP (RICHARD RONTELTAP) (08/23/88)
[ Tested the ticktock.c program ] Firstly: 286 Xenix'ers should compile the test program of J.F. Haugh (II?, come on!) to large model with the -Ml switch. Welllll, I ran the test program on XENIX /386 2.2.1 and 2.2.3 with the same results. When the program is started the first time only one TICK/TOCK is printed. When it is started the second time. TICK/TOCK is infinitely printed. I think what happens is: When the shared memory is created, and the parent process has printed TICK, the context is switched to the child process right after the 'signal' command and just before the 'pause' command. When the child now signals the parent, the signal is caught and the parent goes the the next command: pause(), and waits for ever! The second time scheduling is different because the shared memory doesn't have to be created. All this is rather far fetched, but the only explenation I can think of. At least no panic's or core dumps. Can anyone else post experiences? Maybe Mr Chapman from SCO Kernel development can comment on this? Richard (...!mcvax!neabbs!richard)
jfh@rpp386.UUCP (The Beach Bum) (08/25/88)
In article <22012@neabbs.UUCP> richard@neabbs.UUCP (RICHARD RONTELTAP) writes: >[ Tested the ticktock.c program ] > >Firstly: 286 Xenix'ers should compile the test program of J.F. Haugh >(II?, come on!) to large model with the -Ml switch. [ yes, my uncle was john f. haugh. he wasn't married when i was born so it was assumed he would remain childless. my legal name is jfh2. ] >When the program is started the first time only one TICK/TOCK is >printed. When it is started the second time. TICK/TOCK is infinitely >printed. [ ... ] >The second time scheduling is different because the shared memory >doesn't have to be created. this program should work regardless of scheduling. on the first entry into child() the busy loop will be executed because loc[0] was set to zero prior to the fork. the signal handler was set prior to entry to child (but should have been set before the fork() - stupid me). if parent() executes the kill() call before the child() executes the signal() call, then you should have seen TICK ... with a hang forever. the fix is to move the signal() call to before the fork(). if ... TOCK is printed then signal() has been called. >All this is rather far fetched, but the only explenation I can think >of. At least no panic's or core dumps. no, it is very plausible if only TICK ... was printed. this is why concurrent programming is such a joy and volatile variables have to be treated specially. because this ain't easy sh*t. -- John F. Haugh II (jfh@rpp386.UUCP) HASA, "S" Division "If the code and the comments disagree, then both are probably wrong." -- Norm Schryer
jbayer@ispi.UUCP (id for use with uunet/usenet) (08/25/88)
In article <22012@neabbs.UUCP>, richard@neabbs.UUCP (RICHARD RONTELTAP) writes: > [ Tested the ticktock.c program ] > > Welllll, I ran the test program on XENIX /386 2.2.1 and 2.2.3 with the > same results. > > When the program is started the first time only one TICK/TOCK is > printed. When it is started the second time. TICK/TOCK is infinitely > printed. > > I think what happens is: > When the shared memory is created, and the parent process has printed > TICK, the context is switched to the child process right after the > 'signal' command and just before the 'pause' command. When the child > now signals the parent, the signal is caught and the parent goes the > the next command: pause(), and waits for ever! > > The second time scheduling is different because the shared memory > doesn't have to be created. > > I think Richard is right. I added two sleep(1) to the program, one in the child() and one in the parent(). With these additions the program starts up and prints TICK/TOCK even when creating the shared memory segment for the first time. I enclosed the new program below: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> #include <signal.h> int zero = 0; int *loc = &zero; int key = ('v' << 8) | 'o'; catch (sig) int sig; { signal (sig, catch); } parent () { while (1) { while (*loc) ; write (1, "TICK ....\n", 10); *loc = 1; sleep(1); /* added by JB 8/25/88 */ kill (loc[2], SIGUSR1); pause (); } } child () { while (1) { while (! *loc) ; write (1, ".... TOCK\n", 10); *loc = 0; sleep(1); /* added by JB 8/25/88 */ kill (loc[1], SIGUSR1); pause (); } } main () { int id; if ((id = shmget (key, 3 * sizeof (int), IPC_CREAT|0666)) == -1) { perror ("shmget"); exit (1); } if ((loc = (int *) shmat (id, (char *) 0, 0)) == (int *) 0) { perror ("shmat"); exit (1); } loc[0] = 0; switch (fork ()) { default: loc[1] = getpid (); signal (SIGUSR1, catch); parent (); case 0: loc[2] = getpid (); signal (SIGUSR1, catch); child (); case -1: perror ("fork"); exit (1); } exit (1); } - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - It does work fine now on 386 and 286 Xenix. Jonathan Bayer
root@telmail.UUCP (Super user) (08/26/88)
In article <5786@rpp386.UUCP> jfh@rpp386.UUCP (The Beach Bum) writes: >this program should work regardless of scheduling. on the first entry >into child() the busy loop will be executed because loc[0] was set to >zero prior to the fork. the signal handler was set prior to entry to >child (but should have been set before the fork() - stupid me). > >if parent() executes the kill() call before the child() executes the >signal() call, then you should have seen TICK ... with a hang forever. >the fix is to move the signal() call to before the fork(). if ... TOCK >is printed then signal() has been called. That's not what I said in my article. Just to be sure, I've tried to move the signal() before the fork, but got exactly the same results. I'll try to explain again with a little code: When I start the program the first time, I get 1 TICK/TOCK. The second time I get infinite TICK/TOCK's. The result of the first time is caused by unfortunate scheduling, I think, and here's why (first the fragment): >parent () >{ > while (1) { > while (*loc) > ; > > write (1, "TICK ....\n", 10); > *loc = 1; > kill (loc[2], SIGUSR1); > pause (); > } >} > >child () >{ > while (1) { > while (! *loc) > ; > > write (1, ".... TOCK\n", 10); > *loc = 0; > kill (loc[1], SIGUSR1); > pause (); > } >} Because loc[0] was initialised to 0, the child process waits if it happens to get to the 'while' loop first. The parent process passes the loop, prints TICK, changes *loc to 1 and signals the child process. AT THIS INSTANT, i.e. BEFORE the parent reaches pause(), the scheduler transfers control to the child process. (btw is this possible?) The child process prints TOCK, sets *loc to 0, signals the parent, and pauses. The parent catches the signal, and continues with the next instruction: THE PAUSE() INSTRUCTION, and waits for a signal from the child forever. Get it? I don't know if the signals were a relevent part of the testing procedure, but I've 'rewritten' the program without them, and it works just fine. Of course it doesn't run as fast because of massive waiting in the 'while' loops, waiting for the scheduler to transfer control to the child or vice versa. Here is the new program: ------------------------------------------------------------------- #include <sys/types.h> #include <sys/ipc.h> #include <sys/shm.h> int *loc; int key = ('v' << 8) | 'o'; parent () { while (1) { while (*loc) ; write (1, "TICK ....\n", 10); *loc = 1; } } child () { while (1) { while (! *loc) ; write (1, ".... TOCK\n", 10); *loc = 0; } } main () { int id; if ((id = shmget (key, sizeof (int), IPC_CREAT|0666)) == -1) { perror ("shmget"); exit (1); } if ((loc = (int *) shmat (id, (char *) 0, 0)) == (int *) 0) { perror ("shmat"); exit (1); } *loc = 0; switch (fork ()) { case -1: perror ("fork"); exit (1); case 0: child (); default: parent (); } exit (1); } ---------------------------------------------------------- Richard (...!mcvax!neabbs!richard)
john@jetson.UPMA.MD.US (John Owens) (08/26/88)
In article <5786@rpp386.UUCP>, jfh@rpp386.UUCP (The Beach Bum) writes: > this program should work regardless of scheduling. > if ... TOCK > is printed then signal() has been called. Not the first time. If ... TOCK is printed once (after TICK ... is printed), then parent() set loc[0] and child()'s while loop ended. Yes, parent and child both have called signal(), but the signal apparently doesn't go through. I think that parent() executes kill (loc[2], SIGUSR1); before the child process executes loc[2] = getpid(); and the child process never receives a signal. -- John Owens john@jetson.UPMA.MD.US SMART HOUSE L.P. uunet!jetson!john (old uucp) +1 301 249 6000 john%jetson.uucp@uunet.uu.net (old internet)
jfh@rpp386.UUCP (The Beach Bum) (08/26/88)
In article <128@jetson.UPMA.MD.US> john@jetson.UPMA.MD.US (John Owens) writes: >I think that parent() executes > kill (loc[2], SIGUSR1); >before the child process executes > loc[2] = getpid(); >and the child process never receives a signal. the original version busy waited and didn't use signals. the version i posted used signals to increase the number of interations per second, but wasn't tested very well ... john has found Yet Another Bug(TM) in the code, which is still further proof as to how difficult concurrent programming can get. without some form of p/v operations, that program is very difficult to write. the new version uses message queues and screams like a banshee. that should be final proof as to how bullet proof the message queues are under xenix. -- John F. Haugh II (jfh@rpp386.UUCP) HASA, "S" Division "If the code and the comments disagree, then both are probably wrong." -- Norm Schryer
jfh@rpp386.UUCP (The Beach Bum) (08/27/88)
In article <5867@rpp386.UUCP> jfh@rpp386.UUCP (The Beach Bum) writes: >the new version uses message queues and screams like a banshee. that >should be final proof as to how bullet proof the message queues are >under xenix. and here it is. i actually developed this on pigs, a 68020 vme bus machine. the code compiled first time out on rpp386. portable, no? just a brief overview - the parent and child swap "TICK ...." and ".... TOCK" message back and forth using a message queue. two different type messages are used. type 1 is from the parent and is expected by the child. type 2 is from the child and is expected by the parent. this insures the two processes remain synchronized. for a really good work out, run this on the console. if you want to prove there are NO bugs in the message passing code (despite what certain SCO bashers will say) run this in the background with a real high nice for a few days. a bug fixed version of the shared memory tester could also be run to further bebunk the sco nay-sayers. what the heck, run them both in the background with a nice of say, plus 20, for a couple of days. that should find any kinks. ------------------------ cut and save as msgque.c ---------------------- #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> #include <signal.h> key_t msgkey = ('m' << 8) | 's'; int msgqid; struct mymsgbuf { int mytype; char mytext[11]; }; struct mymsgbuf pmsg = { 1, "TICK ....\n" }; struct mymsgbuf cmsg = { 2, ".... TOCK\n" }; int childpid; parent () { struct mymsgbuf buf; while (1) { memset (&buf, sizeof buf, 0); if (msgrcv (msgqid, &buf, sizeof buf.mytext, 2L, 0) < 0) perror ("parent: msgrcv"); write (1, buf.mytext, sizeof buf.mytext); if (msgsnd (msgqid, &pmsg, sizeof pmsg.mytext, 0) < 0) perror ("parent: msgsnd"); } } child () { struct mymsgbuf buf; while (1) { memset (&buf, sizeof buf, 0); if (msgrcv (msgqid, &buf, sizeof buf.mytext, 1L, 0) < 0) perror ("child: msgrcv"); write (1, buf.mytext, sizeof buf.mytext); if (msgsnd (msgqid, &cmsg, sizeof cmsg.mytext, 0) < 0) perror ("child: msgsnd"); } } main () { if ((msgqid = msgget (msgkey, IPC_CREAT|0666)) == -1) { perror ("msgget"); exit (1); } switch (childpid = fork ()) { default: /* prime the pump ... */ if (msgsnd (msgqid, &pmsg, sizeof pmsg.mytext, 0)) { perror ("msgsnd"); kill (childpid, 9); exit (1); } parent (); case 0: child (); case -1: perror ("fork"); exit (1); } exit (1); } -- John F. Haugh II (jfh@rpp386.UUCP) HASA, "S" Division "If the code and the comments disagree, then both are probably wrong." -- Norm Schryer
lab@sdgsunsdgsun.com (Larry Baird) (08/27/88)
in article <166@ispi.UUCP>, jbayer@ispi.UUCP (id for use with uunet/usenet) says: > > I think Richard is right. I added two sleep(1) to the program, one in > the child() and one in the parent(). With these additions the program > starts up and prints TICK/TOCK even when creating the shared memory > segment for the first time. I enclosed the new program below: An better fix is to move the setting of loc[0] (*loc = 1 and *loc = 0 ) to after there respective kills. The first kill from parent to child will be ignored, but the kill from child to parent will sink up the whole process. -- Larry A. Baird Software Design Group, Inc. Manager, Software Development 800 Trafalgar Ct. Suite 340 UUCP:ucf-cs!sdgsun!lab Maitland, FL 32751 CIS: 72355,171 (407) 660-0006
woods@gpu.utcs.toronto.edu (Greg Woods) (08/27/88)
In article <5872@rpp386.UUCP> jfh@rpp386.UUCP (The Beach Bum) writes: > In article <5867@rpp386.UUCP> jfh@rpp386.UUCP (The Beach Bum) writes: > >the new version uses message queues and screams like a banshee. that > >should be final proof as to how bullet proof the message queues are > >under xenix. > > and here it is. i actually developed this on pigs, a 68020 vme bus > machine. the code compiled first time out on rpp386. portable, no? I'll ignore that remark... > for a really good work out, run this on the console. if you want > to prove there are NO bugs in the message passing code (despite ????? > what certain SCO bashers will say) run this in the background with Like ME for instance???? > a real high nice for a few days. a bug fixed version of the shared > memory tester could also be run to further bebunk the sco nay-sayers. > what the heck, run them both in the background with a nice of say, > plus 20, for a couple of days. that should find any kinks. How about running it for a couple of weeks, with no nice factor, along with a shm and a sem tester, in multiple incarnations. Meanwhile, do a WHOLE lot of disk and tty I/O. In other words, push it to the limit. Make the machine so slow as to be un-usable for anything else. Come on guys. Even the support people at SCO came up with a better test programme, and still had no luck finding any bugs. It works, but if you work it too hard, it'll drop. Now I know better: don't try to do something with the wrong tools. I have no doubt Xenix is a nice little implementation of Unix for those who can't justify non-PC hardware (all too many in these days of < $1000 clones), and who can't decide if they like SysIII, SysV, V7, or BSD. A nice little hack that gives you a little of each, but the best of none. Mind you, I would rather have it than MS-DOS or OS/2. [ and you'll note I don't put a smiley after this sentence ] I should also say that the SCO support people do try, and care about the quality of their product. It's just that they had a lot to do to make up for a poor start, and they are working on the most unforgiving hardware in common use. -- Greg Woods. UUCP: utgpu!woods, utgpu!{ontmoh, ontmoh!ixpierre}!woods VOICE: (416) 242-7572 [h] LOCATION: Toronto, Ontario, Canada
haugj@pigs.UUCP (Joe Bob Willie) (08/28/88)
In article <114@telmail.UUCP> root@telemail.UUCP (Richard Ronteltap) writes: >Because loc[0] was initialised to 0, the child process waits if it happens >to get to the 'while' loop first. The parent process passes the loop, prints >TICK, changes *loc to 1 and signals the child process. AT THIS INSTANT, i.e. >BEFORE the parent reaches pause(), the scheduler transfers control to the >child process. (btw is this possible?) what appears to have been happening is that if the parent ran all the way to the kill() call before the child called signal(), the child dies from the signal and the parent waits in pause() for the dead child to kill the parent(). this can only happen with the parent because of setting *loc = 0. if the CHILD beat the PARENT through the loop after being kill()'d to kill() the parent BEFORE the parent executes the pause(), the parent waits in pause() forever for a signal which has already been delivered, along with the child who is waiting for the parent. it is possible for the scheduler to pick any runnable process at just about any time (well, only certain times, but it appears suitably random to the process) to run, and may suspend any running process to do so. the only restriction on putting processes to sleep is that a process running in system space can't been involuntarily put to sleep. it must call sleep() itself. -- =-=-=-=-=-=-=-The Beach Bum at The Big "D" Home for Wayward Hackers-=-=-=-=-=-= Very Long Address: John.F.Haugh@rpp386.dallas.tx.us Very Short Address: jfh@rpp386 "ANSI C: Just say no" -- Me
haugj@pigs.UUCP (Joe Bob Willie) (08/28/88)
In article <105@sdgsunsdgsun.com> lab@sdgsunsdgsun.com (Larry Baird) writes: >in article <166@ispi.UUCP>, jbayer@ispi.UUCP (id for use with uunet/usenet) says: >An better fix is to move the setting of loc[0] > (*loc = 1 and *loc = 0 ) >to after there respective kills. >The first kill from parent to child will be ignored, but the >kill from child to parent will sink up the whole process. the original code didn't use signals. unix signals can result in race conditions since there is no atomic method to send a signal and wait for the receipt of a signal with one system call. so long as control returns to the user between the kill() and the pause(), a race exists. in this case, should the scheduler chose to execute the child immediately after the parent (works either way, by the way) sets *loc = 1, the child can go all the way around its loop and kill the parent before it gets a chance to enter pause(). -- =-=-=-=-=-=-=-The Beach Bum at The Big "D" Home for Wayward Hackers-=-=-=-=-=-= Very Long Address: John.F.Haugh@rpp386.dallas.tx.us Very Short Address: jfh@rpp386 "ANSI C: Just say no" -- Me
fr@icdi10.uucp (Fred Rump from home) (08/30/88)
Yes, Greg. That's exactly the point. And you said it. Give us a reason to use xyz machine with abc software and we'll do it. In the meantime Xenix on fast 386's runs just fine for the rest of us. -- {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!cdin-1!icdi10!fr 26 Warren St. or ...{bellcore,rutgers,cbmvax}!bpa!cdin-1!icdi10!fr Beverly, NJ 08010 or...!bikini.cis.ufl.edu!ki4pv!cdis-1!cdin-1!icdi10!fr 609-386-6846 "Freude... Alle Menschen werden Brueder..." - Schiller
chip@vector.UUCP (Chip Rosenthal) (09/01/88)
A couple of comments and questions about the IPC test program -- the shared memory version, not the message queue one. >int zero = 0; >int *loc = &zero; Why is loc being set here? One of the first actions in main() is: > if ((loc = (int *) shmat (id, (char *) 0, 0)) == (int *) 0) { I don't understand the purpose of "zero". Can anybody help out? Second, wouldn't it be more realistic to drop the pause() and just do a polling loop? I would change: > while (*loc) > ; to something like: > while (*loc) > sleep(1); In a multi-processing package, it is reasonable to fix the IPC service id number to a known value. (Grrrr...I've heard the performance arguments. I *still* wish IPC mapped to a filesystem name rather than using a stupid, magic ID number.) But, is it realistic for the service requestor to know the PID of the service server? Furthermore, this would get rid of the bugs which have been pointed out. All of which are with signals and not SysV IPC. And we all know how reliable signals are :-( -- Chip Rosenthal chip@vector.UUCP | I've been a wizard since my childhood. Dallas Semiconductor 214-450-0486 | And I've earned some respect for my art.
jfh@rpp386.Dallas.TX.US (The Beach Bum) (09/02/88)
In article <530@vector.UUCP> chip@vector.UUCP (Chip Rosenthal) writes: >A couple of comments and questions about the IPC test program -- the >shared memory version, not the message queue one. > >Why is loc being set here? One of the first actions in main() is: the code was old and moldy from other uses and i didn't clean it up. >I don't understand the purpose of "zero". Can anybody help out? originally it was there for paranoia. >Second, wouldn't it be more realistic to drop the pause() and just do >a polling loop? I would change: yes, the original did do polling, but the tick ... ... tock's came at one second intervals as each processes quantum expired. [ this is only true on an idle system where no pre-emption is occuring. more or less ;-) ] putting in the signals sped things up, so i posted it. i didn't ever expect it to fall under close scrutiny. >In a multi-processing package, it is reasonable to fix the IPC service >id number to a known value. (Grrrr...I've heard the performance arguments. >I *still* wish IPC mapped to a filesystem name rather than using a stupid, >magic ID number.) But, is it realistic for the service requestor to know >the PID of the service server? i suppose it would depend on the implementation. if you are using semaphores, then i doubt it. for shared memory, why not? -- John F. Haugh II (jfh@rpp386.Dallas.TX.US) HASA, "S" Division "If the code and the comments disagree, then both are probably wrong." -- Norm Schryer
chip@vector.UUCP (Chip Rosenthal) (09/04/88)
In article <6141@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (The Beach Bum) writes: >In article <530@vector.UUCP> chip@vector.UUCP (Chip Rosenthal) writes: >>is it realistic for the service requestor to know >>the PID of the service server? >if you are using semaphores, then i doubt it. for shared memory, why not? I guess you are right. Probably do something like have the service requestor attach to the segment, read out the PID of the service provider, leave the request, and then signal the provider that a request is awaiting. You wouldn't be beating on it as hard as the test case did, so the chance of signal races is reduced. The only limitation I see is that the requestor needs to be the same UID as the provider to send the signal. -- Chip Rosenthal chip@vector.UUCP | I've been a wizard since my childhood. Dallas Semiconductor 214-450-0486 | And I've earned some respect for my art.