[comp.sys.ibm.pc.rt] Problems with swap space??

mars@duteca (mars@duteca.tudelft.nl) (02/26/90)

We have a 4 RT PCs running under AIX 2.2.1 and have had a large amount of
crashes lately. It appeared that whenever I executed a really large
executable >1M, one of the RTs always crashed, whereas on the other RTs
it could be executed without problems. The same problem occured when
two or more applications such as large compilations where executed 
simultaneously. To locate the cause of this I wrote a program that:
1. allocated 20M
2. wrote data (!= 0) in the 20 M buffer
3. read the data
It appeared that when this program was executed, the RT always crashed.

It seems to me that the problem is caused by a problem with the swap space,
so I tried the program on the other RTs and it appeared that the same
problem occured (although at least one of them didn't crash until at least
one other extra application was running). I also tried the program on a
SUN, but couldn't crash it.

If anybody has noticed the same problem and found a solution to it,
please email me. Maybe a larger swap space is the solution, but
even if the swap space (+/- 22M) was chosen too small, at least an
error message could be expected when too much memory has been allocated.
I'm not really a RT expert, but when I had to make a guess about the
cause of this, I would say that the swap space that is reserved for the
vital AIX applications has been chosen to small. Please email suggestions
to mars@duteca.tudelft.nl, thanx in advance.

-- 
***************************************************************************
* Gert-Jan Tromp                                                         *
* Delft University of Technology                                        *
* Dept. of Elect. Engineering, room 10.04                              *
* P.O. Box 5031, 2600 GA Delft, The Netherlands                         *
* Email: mars@duteca.tudelft.nl                                          *
***************************************************************************

moody@moody..austin.ibm.com (02/28/90)

In article <728@duteca4.UUCP> mars@duteca.tudelft.nl () writes:
[much of article deleted]
>It seems to me that the problem is caused by a problem with the swap space,
>so I tried the program on the other RTs and it appeared that the same
>problem occured (although at least one of them didn't crash until at least
>one other extra application was running). I also tried the program on a
>SUN, but couldn't crash it.
>
>Maybe a larger swap space is the solution, but

A larger swap space is the ultimate solution.

>error message could be expected when too much memory has been allocated.

>***************************************************************************
>* Gert-Jan Tromp                                                         *
>* Delft University of Technology                                        *
>* Dept. of Elect. Engineering, room 10.04                              *
>* P.O. Box 5031, 2600 GA Delft, The Netherlands                         *
>* Email: mars@duteca.tudelft.nl                                          *
>***************************************************************************

AIX version 2 on the RT uses a delayed allocation scheme for
allocating paging slots to an application.  This may result in an
overcommitting of the page space on the RT for some applications which
use a lot of memory on a system with a small paging space.  One of our
vendors was the first we knew of to have the problem and I developed a
solution for them.  This solution is an easy way for developers to
port these large, memory consuming apps.  Applicationss originally
written for BSD systems seem to have this problem more often since BSD
does things differently.

The following program fragment is my solution.

/* ----------------------- cut here --------------------------------*/

#define PAGESIZE 2048
#define NULL 0
#include <sys/signal.h>
volatile int dangerflag;
void free();

int main()
{	int size, *ptr;
	
	void handler();
	char *MALLOC();
	
	/* ensure your favorite program catches sigdanger */
	signal(SIGDANGER,handler);
	
	/*************************************************************
	 your favorite memory hogging program which uses MALLOC to
	 allocate storage (not malloc)
	*************************************************************/
	size = ????;

	ptr = MALLOC(size);
	....
	....

}

void handler()
{
	/* sigdanger is sent when the number of paging slots drops
	below the pswarn threshold (see /etc/master to tailor this).
	Setting this threshold to a higher level may help
	even if you don't use the rest of this solution. */

	dangerflag = 1;
}

/* MAXMEMSIZE is chosen to be the largest real memory configuration
   supported on the RT */

#define MAXMEMSIZE	0x1000000

/* MALLOC is used to overcome the possibility of overcommitting the
   page space */

char *MALLOC(size)
unsigned int size;

{
	char	*malloc();
	char	*p,*q;
	int	i;
	volatile int	*numps; /* number of paging slots from low memory */

	/* Make sure dangerflag is initialized */
	dangerflag = 0;

	/* point to number of paging slots in AIX low memory */
	numps = (int *)0xb8;

	/* ensure there is enough backing storage to back all of
	   memory (note MAXMEMSIZE is as good as I can get here: that
	   is, a lessor value wouldn't work on machines with less
	   memory */
	if ((*numps * PAGESIZE) < (size + MAXMEMSIZE)) return((char*)NULL);


	/* call the real malloc to get the storage */
	q = p = malloc(size);
	if (p == (char *)NULL) return(p);

	for(i = 0; i < size ; i += PAGESIZE)
	{
		/* touch and dirty the next page */
		*q = 0;

		/* Get out if we went below the pswarn threshold */
		if (dangerflag) goto getout;

		/* bump to the next page (note: must be careful to 
		   touch on the next page boundary and not in the
		   middle of the page */
		q = (char *)(((int)q & (~(PAGESIZE-1))) + PAGESIZE); 
	}

	/* ensure there is still enough backing storage */
	if ((*numps * PAGESIZE) < (size + MAXMEMSIZE)) goto getout;
	return(p);


	getout: /* There isn't enough space available */
	free(p);
	return((char *)NULL);
}

/* ------------------------- cut here -----------------------------*/

Disclaimer:  The above program is already in the public domain and is posted
here without warranty (even though I know it works).
James Moody	Adv Workstations Div ; IBM Austin, 2502	  
		aesnet: moody@moody.austin.ibm.com
		vnet: MOODY at AUSVM6
outside ->	..!cs.utexas.edu!ibmchs!auschs!moody.austin.ibm.com!moody