story@can503.UUCP (Robert Story) (09/09/89)
We are having a problem with message queues under Xenix. Software - Large financial application written in c Computer Systems - IBM PS2 Model 70-E21 and IBM PS2 Model 70-121, both with IBM 120MB hard drive, IBM 60MB internal tape drive, eight port Stallion serial board with 3 to 6 terminals, serial printer and parallel printer. Operating System - SCO Xenix System V 2.2.3 Other Software - Btrieve Record Manager Version 4.10 (80286 version) - Panel Plus Version 1.00c The problem seems to arise under heavy load, with 3 to 6 users all running the financial application and printing documents. A process will msg to btrieve and then set an alarm for 60 seconds and sit on the msgrcv call. With a large load one or two of the processes will get the alarm signal. Examination of the message queues with ipcs shows messages from/to Btrieve in the queues but attempts to read these messages with msgrcv() and message type set to zero show an empty queue. A call to msgctl() with IPC_STAT reports messages in the queue but the pointers to the first and last messages are 0. Subsequent messaging to btrieve carries on as normal. We do not believe that the problem is with the '286 version of btrieve. When we went to '386 mode for our code I changed the appropriate int's to shorts for the interfacing code with btrieve. This has worked since last November in this manner. However, the btrieve people in Austin have just sent us the '386 version over the wire and we will be trying that on Monday. We had problem in this area last week but tracked it down to being a queue sizing problem and have now configured the message queues to be more than adequate. Of course, this problem is really messy because one can not buy the source code for btrieve and it therefore is a large unknown black box. Your thoughts would be appreciated. Please e-mail me. Thanks. -- [ Robert Story ..{!utzoo!censor,!uunet!zardoz!avcoint}!avcocan!story ] [ SnailMail : AFS 201 Queens Avenue London Ontario Canada N6A 1J1 ] [ or : AFS 3349 Michelson Drive Irvine California USA 92715-1606 ] [ Voice : +1 519 672-4220 xtn 233 ]
story@can503.UUCP (Robert Story) (09/20/89)
In article <295@can503.UUCP> I wrote of the following : >The problem seems to arise under heavy load, with 3 to 6 users all running >our financial application and printing documents. A process will msg to > btrieve and then set an alarm for 60 seconds and sit on the msgrcv call. >With a large load one or two of the processes will get the alarm signal. >Examination of the message queues with ipcs shows messages from/to Btrieve in >the queues but attempts to read these messages with msgrcv() and message type >set to zero show an empty queue. A call to msgctl() with IPC_STAT reports >messages in the queue but the pointers to the first and last messages are 0. >Subsequent messaging to btrieve carries on as normal. We had a person from SCO on site for a week and last Saturday found the problem. IT IS a kernel bug. If the kernel is copying to/from the user's data area and suffers a page fault then the kernel will put this process to sleep. In the meantime another process also using the message queues can steam through and do its thing. When the original process wakes up it will have had its pointers realigned and, of course, weird things begin to happen. Sometimes the free list turned up on queue 1 or queue 0 turned up on the free list. Which explains why ipcs thought that there were messages when there weren't. This problem has been fixed in the ATT 3.1 code and the SCO 3.2 code by using semaphores in the critical areas. I hope this helps others. It cost our company a lot of money to discover this one. This bug only surfaced before a major release so things were pretty tense here. I had a good time, though. It's not every day I get to assist in debugging kernel code. If anyone wants more details, please e-mail me. -- [ Robert Story ..{!utzoo!censor,!uunet!zardoz!avcoint}!avcocan!story ] [ SnailMail : AFS 201 Queens Avenue London Ontario Canada N6A 1J1 ] [ or : AFS 3349 Michelson Drive Irvine California USA 92715-1606 ] [ Voice : +1 519 672-4220 xtn 233 ]