mars@duteca (mars@duteca.tudelft.nl) (02/26/90)
We have a 4 RT PCs running under AIX 2.2.1 and have had a large amount of crashes lately. It appeared that whenever I executed a really large executable >1M, one of the RTs always crashed, whereas on the other RTs it could be executed without problems. The same problem occured when two or more applications such as large compilations where executed simultaneously. To locate the cause of this I wrote a program that: 1. allocated 20M 2. wrote data (!= 0) in the 20 M buffer 3. read the data It appeared that when this program was executed, the RT always crashed. It seems to me that the problem is caused by a problem with the swap space, so I tried the program on the other RTs and it appeared that the same problem occured (although at least one of them didn't crash until at least one other extra application was running). I also tried the program on a SUN, but couldn't crash it. If anybody has noticed the same problem and found a solution to it, please email me. Maybe a larger swap space is the solution, but even if the swap space (+/- 22M) was chosen too small, at least an error message could be expected when too much memory has been allocated. I'm not really a RT expert, but when I had to make a guess about the cause of this, I would say that the swap space that is reserved for the vital AIX applications has been chosen to small. Please email suggestions to mars@duteca.tudelft.nl, thanx in advance. -- *************************************************************************** * Gert-Jan Tromp * * Delft University of Technology * * Dept. of Elect. Engineering, room 10.04 * * P.O. Box 5031, 2600 GA Delft, The Netherlands * * Email: mars@duteca.tudelft.nl * ***************************************************************************
moody@moody..austin.ibm.com (02/28/90)
In article <728@duteca4.UUCP> mars@duteca.tudelft.nl () writes: [much of article deleted] >It seems to me that the problem is caused by a problem with the swap space, >so I tried the program on the other RTs and it appeared that the same >problem occured (although at least one of them didn't crash until at least >one other extra application was running). I also tried the program on a >SUN, but couldn't crash it. > >Maybe a larger swap space is the solution, but A larger swap space is the ultimate solution. >error message could be expected when too much memory has been allocated. >*************************************************************************** >* Gert-Jan Tromp * >* Delft University of Technology * >* Dept. of Elect. Engineering, room 10.04 * >* P.O. Box 5031, 2600 GA Delft, The Netherlands * >* Email: mars@duteca.tudelft.nl * >*************************************************************************** AIX version 2 on the RT uses a delayed allocation scheme for allocating paging slots to an application. This may result in an overcommitting of the page space on the RT for some applications which use a lot of memory on a system with a small paging space. One of our vendors was the first we knew of to have the problem and I developed a solution for them. This solution is an easy way for developers to port these large, memory consuming apps. Applicationss originally written for BSD systems seem to have this problem more often since BSD does things differently. The following program fragment is my solution. /* ----------------------- cut here --------------------------------*/ #define PAGESIZE 2048 #define NULL 0 #include <sys/signal.h> volatile int dangerflag; void free(); int main() { int size, *ptr; void handler(); char *MALLOC(); /* ensure your favorite program catches sigdanger */ signal(SIGDANGER,handler); /************************************************************* your favorite memory hogging program which uses MALLOC to allocate storage (not malloc) *************************************************************/ size = ????; ptr = MALLOC(size); .... .... } void handler() { /* sigdanger is sent when the number of paging slots drops below the pswarn threshold (see /etc/master to tailor this). Setting this threshold to a higher level may help even if you don't use the rest of this solution. */ dangerflag = 1; } /* MAXMEMSIZE is chosen to be the largest real memory configuration supported on the RT */ #define MAXMEMSIZE 0x1000000 /* MALLOC is used to overcome the possibility of overcommitting the page space */ char *MALLOC(size) unsigned int size; { char *malloc(); char *p,*q; int i; volatile int *numps; /* number of paging slots from low memory */ /* Make sure dangerflag is initialized */ dangerflag = 0; /* point to number of paging slots in AIX low memory */ numps = (int *)0xb8; /* ensure there is enough backing storage to back all of memory (note MAXMEMSIZE is as good as I can get here: that is, a lessor value wouldn't work on machines with less memory */ if ((*numps * PAGESIZE) < (size + MAXMEMSIZE)) return((char*)NULL); /* call the real malloc to get the storage */ q = p = malloc(size); if (p == (char *)NULL) return(p); for(i = 0; i < size ; i += PAGESIZE) { /* touch and dirty the next page */ *q = 0; /* Get out if we went below the pswarn threshold */ if (dangerflag) goto getout; /* bump to the next page (note: must be careful to touch on the next page boundary and not in the middle of the page */ q = (char *)(((int)q & (~(PAGESIZE-1))) + PAGESIZE); } /* ensure there is still enough backing storage */ if ((*numps * PAGESIZE) < (size + MAXMEMSIZE)) goto getout; return(p); getout: /* There isn't enough space available */ free(p); return((char *)NULL); } /* ------------------------- cut here -----------------------------*/ Disclaimer: The above program is already in the public domain and is posted here without warranty (even though I know it works). James Moody Adv Workstations Div ; IBM Austin, 2502 aesnet: moody@moody.austin.ibm.com vnet: MOODY at AUSVM6 outside -> ..!cs.utexas.edu!ibmchs!auschs!moody.austin.ibm.com!moody