[comp.os.vms] Virtually nuked...

DHASKIN@CLARKU.BITNET (Denis W. Haskin, Manager, Technical Services) (06/19/87)

Help!

We have a research group working with a simulation of the effects of a reactor
accident (a la TMI).  They are having difficulty linking a FORTRAN program
that uses extremely large arrays for its work, and thus far I have not been
able to determine what might be done to allow it to link.  The last resort
is of course to re-write the routine, but they are pretty adamant that they
need these large arrays.

I am not positive that these data elements are actually the problem, but
they are the best candidates  -- note ACIXY, GCIXY, TACIXY, TGCIXY:

      COMMON /PLUME/ [...omitted...], ACIXY(60,80,0:80), GCIXY(60,80,0:80),
                     FRACIX(60,80)
      COMMON /PTIME/ [...omitted...], TACIXY(0:50,60,80,0:30),
                     TGCIXY(0:50,60,80,0:30), IHOUR, IMIN, ISEC

The original symptom was a "LINK-F-EXPAGQUO, exceeded page file quota" error at
link time.  Upping the user's PGFLQUO took care of that nicely.  Then, a
"LINK-F-MEMFUL, insufficient virtual address space to complete this link"
was the culprit.  The System Messages And Recovery Procedures manual suggests
either

   a) increasing virtual address space or
   b) descreasing the size of the image

and suggests that (b) might be accmplished by

   1) using shareable images or
   2) encouraging demand-zero compression.

(b)(2) seemed most reasonable, so I added an options file with the single
line: DZRO_MIN=1 (I also tried 0).  No effect.

What are our options (other than rewriting the original source)?  As I see
it, we can:

   - encourage demand-zero compression, but at this point it doesn't seem to
     be working (or even having any effect, for that matter); and will the
     problem just return at execution when that image 'demands' zero-filled
     sections of virtual memory the size of the Louisiana Purchase?

   - increase virtual address space, i.e. raising VIRTUALPAGECNT.  It is
     currently 45960.  What ramifications will this have on system behavior
     and other system parameters?

Or are there other options?  I'm not a FORTRAN whiz; is there something we can
do at compile time?  For background info, this is on an 8530 with 20 Mb.
If more information is needed please let me know!


aTdHvAaNnKcSe,

% Denis W. Haskin                             Manager, Technical Services %
% ----------------------------------------------------------------------- %
% DHASKIN@CLARKU.BITNET   Office of Information Systems     (617)793-7193 %
% Clark University               950 Main Street      Worcester MA  01610 %
%                                                                         %
%                       "Revenge is best served cold."                    %
%                                -- Anonymous                             %

LEICHTER-JERRY@YALE.ARPA (06/23/87)

[The message I'm responding to asked for help with FORTRAN programs containing
very large arrays, which caused the Linker to run out of virtual memory.]

There is only one ultimate cure for not having enough virtual memory:  In-
crease VIRTUALPAGECNT!  You report that you are using a value of around
46,000.  On the 8600 I'm writing this from, VIRTUALPAGECNT is 250,000.  The
system has 16 Meg of real memory - not very large by modern standards, actu-
ally.  It works just fine.

The value was set that high because we, too, have run some large scientific
programs - flame simulations, that kind of thing.  Actually, I suspect the
largest virtual size belonged to a program that did neural net simulations.

The main thing VIRTUALPAGECNT will cost you is REAL memory - though not very
much:  Every 128 pages requires 4 bytes of permanently-reserved system memory
per balance set slot; the number of slots is set by BALSETCNT.  So this sys-
tem, which has BALSETCNT=100, is reserving (250000/128)*4*100 bytes (781,600
after rounding), or 1527 pages out of 32768.  Not really significant.

Don't be mislead by the above calculation into thinking that VIRTUALPAGECNT
is free.  There are hidden costs:

	a)  While you can have a huge virtual image running in a very
		small working set, the result isn't going to make anyone
		happy - thrashing is a good way to exercise disks, but
		that's about all that can be said for it.  This system
		has WSMAX set to 16000.  Most people's UAF entries specify
		much lower values - 1024 or 2048 is typical.  Those people
		who are running large images are, of course, given larger
		limits.  (Avoid giving out VERY large WSQUOTA values; instead,
		give people a large WSQUOTA - say, 4000 - and a very large
		WSEXTENT.  Then encourage them to run overnight.  You could
		even set up batch queues with larger working set quotas, so
		that the real biggies can run ONLY when they won't screw
		everyone else up.)

		Implicit in this recommendation, of course, is the requirement
		that you actually have the physical memory.  There's no get-
		ting away from this.  As a friend (Martin Minow) has com-
		mented:  Virtual memory is fine if you want to do virtual
		work.

		Enlarging on this just a bit:  With the array sizes you dis-
		cussed, it's not going to be possible to fit everything into
		the working set at once anyway.  Dealing with very large
		arrays in a paging environment takes some care, as fairly
		trivial changes - interchanging the order of a pair of loops
		driving the subscripts of a 2D array, for example - may cause
		a two order of magnitude difference in performance.  These
		problems have been well studied in the scientific programming
		community; if your users are not familiar with this kind of
		work, make sure they talk to someone who is.  (There are a
		number of people here who I'm sure would be willing to con-
		sult....)

	b)  You'll also need large page files to map those huge FORTRAN arrays
		your people need - they have to live SOMEWHERE when they
		aren't in main memory.  This 8600 has a little over 100,000
		blocks of pagefile.  That's NOT enough for the real monsters.
		The neural net simulation I mentioned, for example, needed
		more space than that.  There is a way around the problem,
		which we took, but it does require some additional work:
		Rather than building the arrays into the program, create them
		as disk file sections using SYS$CRMPSC.  The advantage of this
		is that you can create the disk file on any disk where you
		have free space, even on a temporary basis; you don't have to
		tie space up on a long-term basis.  Also, you can charge the
		particular users who need the large disk resources directly,
		rather than having tens of thousands of blocks of space go
		into a page file that you can't charge anyone for, and which
		no one uses most of the time.

		For the user, a disk file section has another advantage:  You
		can save the file, thus getting an instant checkpoint of the
		array at some point in the computation.  Much faster and sim-
		pler than writing the whole array out with I/O statements,
		then reading it back later.

		The details are too elaborate to go into here; see the System
		Services book for details on disk file sections.  Unfortunat-
		ely, I don't know of any references on using them for the
		kind of thing you need.
							-- Jerry
-------