noren@dinl.uucp (Charles Noren) (05/24/89)
We have been developing a application that uses System V message queues (perhaps thats the first mistake :-)) for interprocess communication. Everything has worked fine until we really wanted to stress test the application by sending it hundereds of messages at once. The application chugs away nicely until it hangs. Long boring discription of the problem follows... First a model of the application. It consists of three processes (call them A, B, and C), and two message queues (call them 1 and 2). The processes and queues are orgainized as: +--------+ +--------+ +--------+ | | +-----+ | | +-----+ | | | Proc A |---->>| Q 1 |-->| Proc B |---->>| Q 2 |-->| Proc C | | | +-----+ | | +-----+ | | +--------+ +--------+ +--------+ Process A generates 200 messages in bursts of about 50 as fast as it can go (CPU bound) and puts it into Queue 1. Process B reads the messages from Queue 1, processes them while looking things up in an Ingres database (we are using Ingres 5.0). Process B sends even more messages to Queue 2 which is read by Process C. After Process A sends 150 messages (and Process B deleivers more messages), Process B tries to write to Queue 2 and hangs (using IPC_WAIT on the msgsnd call). Queue 2 looks empty because Process C is blocking on it (using msgsnd with IPC_WAIT). Queue 1 appears full because when I try to write to it with a no-wait (using diagnostic software), it returns with an errno of 11 (the Sun 3 manual indicates this is caused by a fork with process limit exceeded or insufficent resources). Trying to write to Queue 2 produces the same error. Using diagnostic software to read Queue 1 pulls messages off it of it, and reading Queue 2 several times breaks the log jam and the system runs for a while (trying to read Queue 2 with a no-wait fails returning an errno of 22 -- Invalid argument, and the arguments have been checked). Writing a simplified application with the processes and queues without the database application flows nicely and NEVER hangs, even with waits in process B (and process C). We are running on a Sun 3/260 with 24 MB ram and with SunOS4.01. Any suggestions of what could be happening? Is Ingres using resources common to Message Queues? Have I shown a misunderstanding of how message queues are to be used? Thanks. -- Chuck Noren NET: ncar!dinl!noren US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260, Denver, CO 80201-1260 Phone: (303) 971-7930
noren@dinl.uucp (Charles Noren) (05/25/89)
Investigating my problem some more, I found some interesting things and
I want to bounce some ideas off the net. In examining the <sys/msg.h>
file I found some interesting comments and definitions. These seem
to imply:
1. Each message queue is limited to a default size set at
system configuration time. On our system, this is
currently set to 2048 bytes (MSGMNB in msg.h).
2. While each message queue has a limit, all the messages
queues are limited to a certain amount of memory.
On our system, this is currently set to 8k bytes
(MSGPOOL in msg.h).
3. There is also a fixed limit to the number of message
packets in the message queue system. This is defined
in our system as 50 packets (MSGMNI -- number of message
queue identifiers, MSGTQL -- number of system message
headers.
In modifying my debug utilities that access message queue
statisics, I found that when I was "stuck", none of the
messages in the message queues exceeded the 2048 byte
limit, and the total did not exceed 4k, well within the
8k limit. However, I found that I had 50 messages queued
to all the queues -- the limit in msg.h for the count of message
queue identifiers.
I know how to turn on message queues on the Sun (thanks to answers
from a previous posting), but how do I tune those parameters?
Do I edit msg.h and reconfigure?
Another question: There is a parameter in msg.h, right below the comment,
"The following parameters are assumed not to require tuning", named
MSGMAP that is the number of entries in the map. It is set to 100.
If I change the number of packets in the system to 650, will I need to
set this as well?
Finally, am I wrong in my guesses? Will a knowledgable comment on
my guesses? Also, are there any references to the inside implementation
details on message queues, semaphores, shared memory, sockets, kernal
stuff that you would recommend?
Thanks to those who have replied and started pointing me in the right direction.
--
Chuck Noren
NET: ncar!dinl!noren
US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260,
Denver, CO 80201-1260
Phone: (303) 971-7930