noren@dinl.uucp (Charles Noren) (05/24/89)
We have been developing a application that uses System V message queues (perhaps thats the first mistake :-)) for interprocess communication. Everything has worked fine until we really wanted to stress test the application by sending it hundereds of messages at once. The application chugs away nicely until it hangs. Long boring discription of the problem follows... First a model of the application. It consists of three processes (call them A, B, and C), and two message queues (call them 1 and 2). The processes and queues are orgainized as: +--------+ +--------+ +--------+ | | +-----+ | | +-----+ | | | Proc A |---->>| Q 1 |-->| Proc B |---->>| Q 2 |-->| Proc C | | | +-----+ | | +-----+ | | +--------+ +--------+ +--------+ Process A generates 200 messages in bursts of about 50 as fast as it can go (CPU bound) and puts it into Queue 1. Process B reads the messages from Queue 1, processes them while looking things up in an Ingres database (we are using Ingres 5.0). Process B sends even more messages to Queue 2 which is read by Process C. After Process A sends 150 messages (and Process B deleivers more messages), Process B tries to write to Queue 2 and hangs (using IPC_WAIT on the msgsnd call). Queue 2 looks empty because Process C is blocking on it (using msgsnd with IPC_WAIT). Queue 1 appears full because when I try to write to it with a no-wait (using diagnostic software), it returns with an errno of 11 (the Sun 3 manual indicates this is caused by a fork with process limit exceeded or insufficent resources). Trying to write to Queue 2 produces the same error. Using diagnostic software to read Queue 1 pulls messages off it of it, and reading Queue 2 several times breaks the log jam and the system runs for a while (trying to read Queue 2 with a no-wait fails returning an errno of 22 -- Invalid argument, and the arguments have been checked). Writing a simplified application with the processes and queues without the database application flows nicely and NEVER hangs, even with waits in process B (and process C). We are running on a Sun 3/260 with 24 MB ram and with SunOS4.01. Any suggestions of what could be happening? Is Ingres using resources common to Message Queues? Have I shown a misunderstanding of how message queues are to be used? Thanks. -- Chuck Noren NET: ncar!dinl!noren US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260, Denver, CO 80201-1260 Phone: (303) 971-7930
noren@dinl.uucp (Charles Noren) (05/25/89)
Investigating my problem some more, I found some interesting things and I want to bounce some ideas off the net. In examining the <sys/msg.h> file I found some interesting comments and definitions. These seem to imply: 1. Each message queue is limited to a default size set at system configuration time. On our system, this is currently set to 2048 bytes (MSGMNB in msg.h). 2. While each message queue has a limit, all the messages queues are limited to a certain amount of memory. On our system, this is currently set to 8k bytes (MSGPOOL in msg.h). 3. There is also a fixed limit to the number of message packets in the message queue system. This is defined in our system as 50 packets (MSGMNI -- number of message queue identifiers, MSGTQL -- number of system message headers. In modifying my debug utilities that access message queue statisics, I found that when I was "stuck", none of the messages in the message queues exceeded the 2048 byte limit, and the total did not exceed 4k, well within the 8k limit. However, I found that I had 50 messages queued to all the queues -- the limit in msg.h for the count of message queue identifiers. I know how to turn on message queues on the Sun (thanks to answers from a previous posting), but how do I tune those parameters? Do I edit msg.h and reconfigure? Another question: There is a parameter in msg.h, right below the comment, "The following parameters are assumed not to require tuning", named MSGMAP that is the number of entries in the map. It is set to 100. If I change the number of packets in the system to 650, will I need to set this as well? Finally, am I wrong in my guesses? Will a knowledgable comment on my guesses? Also, are there any references to the inside implementation details on message queues, semaphores, shared memory, sockets, kernal stuff that you would recommend? Thanks to those who have replied and started pointing me in the right direction. -- Chuck Noren NET: ncar!dinl!noren US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260, Denver, CO 80201-1260 Phone: (303) 971-7930