[comp.arch] Scientific Virtual Memory

pardo@june.cs.washington.edu (David Keppel) (07/28/88)

[ Debate: to VM or not VM on scientific computing ]

The issues seem to be that when you have paging, etc., you spend a lot
of your time waiting for page faults to complete.  With async. I/O,
you can "predict" the next region to "fault" and preload it.  A
crystal ball would be very useful here.

Thought: One scheme that appears to help cache hit/miss ratios
(according to a paper by A.J.Smith, in any event) is a predictive
scheme as follows:

  + On a fault, fetch both the thing you need and the next one after.
  + Mark the next one after as "fetched on prediction".
  + First time you touch a thing marked "fetched on prediction", fetch
    the next one after that, and mark it as "fetched on prediction".

Given that array access patterns are *sometimes* predictable, I can
immagine this scheme working very well.  Given that they are not
*always* predictable, I can immagine this scheme oozing mildew.  In
general, tho' I think it would be a much larger win for scientific
computing than for (say) text editing.  Any insights?

	;-D on  ( Is this revile-lant? )  Pardo

jlg@beta.lanl.gov (Jim Giles) (07/28/88)

In article <5382@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes:
> Thought: One scheme that appears to help cache hit/miss ratios
> (according to a paper by A.J.Smith, in any event) is a predictive
> scheme as follows:
> 
>   + On a fault, fetch both the thing you need and the next one after.
>   + Mark the next one after as "fetched on prediction".
>   + First time you touch a thing marked "fetched on prediction", fetch
>     the next one after that, and mark it as "fetched on prediction".

This will only work well under two conditions.

1)  Pages or segments are LARGE.  Otherwise the program finishes processing
    one page long before the next page is available.  Your 'prediction'
    will have overlapped very little I/O with computing.

2)  You must have a large amount of central memory.  Otherwise, a wrong
    'prediction' will swap out something you need on a frequent basis.

Even then, this prediction scheme will still have at least one solid
memory fault for each trip through the outer loops.  This occurs when
you circle back around to the beginning of your arrays.

Unfortunately, the two conditions above are mutually contradictory
(to some extent).  The larger the page size - the worse the penalty
for wrong predictions.  On the other hand, large-memory machines are
usually fast - so they catch up with the unfinished VM sooner.  This
scheme might be useful for novice users, but it is still no substitute
for the user who KNOWS his data usage patterns in advance.

J. Giles

dfk@dukvlsi2.cs.duke.edu (David F. Kotz) (07/29/88)

In article <5382@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes:
> [ Debate: to VM or not VM on scientific computing ]
> 
> The issues seem to be that when you have paging, etc., you spend a lot
> of your time waiting for page faults to complete.  With async. I/O,
> you can "predict" the next region to "fault" and preload it.  A
> crystal ball would be very useful here.
> 
> Thought: One scheme that appears to help cache hit/miss ratios
> (according to a paper by A.J.Smith, in any event) is a predictive
> scheme ...

Most of Smith's work is in the context of prefetching cache lines from
main memory and in prefetching blocks in disk files, and less on the
subject of prepaging in memory hierarchies. Kishor Trivedi did a lot
of work on prepaging in VM for scientific computing (array processing
programs) a while back. Here are some relevant references (in BibTex
format) including the one I believe you refer to above. Comments are
strictly from my view. 

@ARTICLE{trivedi:prepage,
	AUTHOR = "K.S. Trivedi",
	TITLE = "Prepaging and applications to array algorithms",
	JOURNAL = ieeetc,
	YEAR = "1976",
	VOLUME = "C-25",
	NUMBER = "9",
	PAGES = "915-921",
	MONTH = sep,
	   comments = "Virtual memory environment. Data space only. Gives
prepaging algorithm that is optimal {\em wrt} page faults (not
practical). Then gives practical modifications for any replacement
algorithm that should be good. Essentially, algorithm is told when a
page is dead and what pages to prefetch; dead pages are filled with
demanded or prefetched pages; on page fault all spaces are filled with
prefetched pages, or until no more to prefetch.  So some demand
fetches take long but there are fewer faults."
}

@article{trivedi:prepage-auto,
	author =	{Kishor S. Trivedi},
	title =		{On the Paging Performance of Array Algorithms},
	journal =	{IEEE Transactions on Computers},
	year =		{October 1977},
	number =	{10},
	pages =		{938--947},
	volume =	{C-26},
     comments = "See trivedi: prepage. Can rewrite programs to
increase locality of data reference. Can add FREE and PRE calls to to
demand prepaging. Maybe automated by compiler."
}

@article{trivedi:analysis,
	   author = "Kishor S. Trivedi",
	   title = "An Analysis of Prepaging",
	   journal = "Computing",
	   volume = 22,
	   number = 3,
	   year = 1979,
	   pages = "191--210",
	   comments = "(Also Duke TR CS-1977-7.1, but that is missing
figures.) Attempts to reduce page faults at times of transition
between phases of program execution, without dramatically increasing
number of page fetches. Based on LRU, he compares OBL with his FDPLRU,
which uses FREE(x) and PRE(x) clues about array data pages from the
compiler as prefetching hints. FDPLRU is better than both over a
variety of page sizes, best for small page sizes. Also shows better
space-time memory product and better throughput."
}

@article {smith:mem-prefetch,
	   author = "Alan Jay Smith",
	   title = "Sequential Program Prefetching in Memory
	   		 Heirarchies",
	   journal = ieeecomp,
	   year = 1978,
	   month = dec,
	   pages = "7--21",
	   comments = "Examines general issue of prefetching at both cache
and VM level for a variety of page sizes. Found it to be useful for
small page sizes (like 32 bytes). No surprise since the scope of
sequentiality in programs (access to instructions and data) is only
about this size. Best to prefetch always rather than just at faults,
and for all accesses, not just data or just instructions. Treat it
like a regular page for replacement.  Implementation important - avoid
conflicts with normal cache use.  Problem of referencing something
that is still coming in was found not to be too significant."
}

David Kotz
Department of Computer Science, Duke University, Durham, NC 27706
ARPA:	dfk@cs.duke.edu
CSNET:	dfk@duke        
UUCP:	decvax!duke!dfk