pardo@june.cs.washington.edu (David Keppel) (07/28/88)
[ Debate: to VM or not VM on scientific computing ] The issues seem to be that when you have paging, etc., you spend a lot of your time waiting for page faults to complete. With async. I/O, you can "predict" the next region to "fault" and preload it. A crystal ball would be very useful here. Thought: One scheme that appears to help cache hit/miss ratios (according to a paper by A.J.Smith, in any event) is a predictive scheme as follows: + On a fault, fetch both the thing you need and the next one after. + Mark the next one after as "fetched on prediction". + First time you touch a thing marked "fetched on prediction", fetch the next one after that, and mark it as "fetched on prediction". Given that array access patterns are *sometimes* predictable, I can immagine this scheme working very well. Given that they are not *always* predictable, I can immagine this scheme oozing mildew. In general, tho' I think it would be a much larger win for scientific computing than for (say) text editing. Any insights? ;-D on ( Is this revile-lant? ) Pardo
jlg@beta.lanl.gov (Jim Giles) (07/28/88)
In article <5382@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes: > Thought: One scheme that appears to help cache hit/miss ratios > (according to a paper by A.J.Smith, in any event) is a predictive > scheme as follows: > > + On a fault, fetch both the thing you need and the next one after. > + Mark the next one after as "fetched on prediction". > + First time you touch a thing marked "fetched on prediction", fetch > the next one after that, and mark it as "fetched on prediction". This will only work well under two conditions. 1) Pages or segments are LARGE. Otherwise the program finishes processing one page long before the next page is available. Your 'prediction' will have overlapped very little I/O with computing. 2) You must have a large amount of central memory. Otherwise, a wrong 'prediction' will swap out something you need on a frequent basis. Even then, this prediction scheme will still have at least one solid memory fault for each trip through the outer loops. This occurs when you circle back around to the beginning of your arrays. Unfortunately, the two conditions above are mutually contradictory (to some extent). The larger the page size - the worse the penalty for wrong predictions. On the other hand, large-memory machines are usually fast - so they catch up with the unfinished VM sooner. This scheme might be useful for novice users, but it is still no substitute for the user who KNOWS his data usage patterns in advance. J. Giles
dfk@dukvlsi2.cs.duke.edu (David F. Kotz) (07/29/88)
In article <5382@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes: > [ Debate: to VM or not VM on scientific computing ] > > The issues seem to be that when you have paging, etc., you spend a lot > of your time waiting for page faults to complete. With async. I/O, > you can "predict" the next region to "fault" and preload it. A > crystal ball would be very useful here. > > Thought: One scheme that appears to help cache hit/miss ratios > (according to a paper by A.J.Smith, in any event) is a predictive > scheme ... Most of Smith's work is in the context of prefetching cache lines from main memory and in prefetching blocks in disk files, and less on the subject of prepaging in memory hierarchies. Kishor Trivedi did a lot of work on prepaging in VM for scientific computing (array processing programs) a while back. Here are some relevant references (in BibTex format) including the one I believe you refer to above. Comments are strictly from my view. @ARTICLE{trivedi:prepage, AUTHOR = "K.S. Trivedi", TITLE = "Prepaging and applications to array algorithms", JOURNAL = ieeetc, YEAR = "1976", VOLUME = "C-25", NUMBER = "9", PAGES = "915-921", MONTH = sep, comments = "Virtual memory environment. Data space only. Gives prepaging algorithm that is optimal {\em wrt} page faults (not practical). Then gives practical modifications for any replacement algorithm that should be good. Essentially, algorithm is told when a page is dead and what pages to prefetch; dead pages are filled with demanded or prefetched pages; on page fault all spaces are filled with prefetched pages, or until no more to prefetch. So some demand fetches take long but there are fewer faults." } @article{trivedi:prepage-auto, author = {Kishor S. Trivedi}, title = {On the Paging Performance of Array Algorithms}, journal = {IEEE Transactions on Computers}, year = {October 1977}, number = {10}, pages = {938--947}, volume = {C-26}, comments = "See trivedi: prepage. Can rewrite programs to increase locality of data reference. Can add FREE and PRE calls to to demand prepaging. Maybe automated by compiler." } @article{trivedi:analysis, author = "Kishor S. Trivedi", title = "An Analysis of Prepaging", journal = "Computing", volume = 22, number = 3, year = 1979, pages = "191--210", comments = "(Also Duke TR CS-1977-7.1, but that is missing figures.) Attempts to reduce page faults at times of transition between phases of program execution, without dramatically increasing number of page fetches. Based on LRU, he compares OBL with his FDPLRU, which uses FREE(x) and PRE(x) clues about array data pages from the compiler as prefetching hints. FDPLRU is better than both over a variety of page sizes, best for small page sizes. Also shows better space-time memory product and better throughput." } @article {smith:mem-prefetch, author = "Alan Jay Smith", title = "Sequential Program Prefetching in Memory Heirarchies", journal = ieeecomp, year = 1978, month = dec, pages = "7--21", comments = "Examines general issue of prefetching at both cache and VM level for a variety of page sizes. Found it to be useful for small page sizes (like 32 bytes). No surprise since the scope of sequentiality in programs (access to instructions and data) is only about this size. Best to prefetch always rather than just at faults, and for all accesses, not just data or just instructions. Treat it like a regular page for replacement. Implementation important - avoid conflicts with normal cache use. Problem of referencing something that is still coming in was found not to be too significant." } David Kotz Department of Computer Science, Duke University, Durham, NC 27706 ARPA: dfk@cs.duke.edu CSNET: dfk@duke UUCP: decvax!duke!dfk