cet1@cl.cam.ac.uk (C.E. Thompson) (02/01/90)
In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes: >On a related issue, my understanding was (from idle conversation with >some IBM guys several years ago) that the 370 architecture needs at >least 8 elements in its TLB, and that the TLB must be *at least* 4-way >set-associative. The reason is some instruction which copies a source to >a destination while referring to a table (some version of Translate-And-Test >maybe?). Anyway, the idea was that if the instruction, the source, the >destination, and the table entry being used all span pages, then you need >at least 8 valid entries in the TLB to make progress on the instruction. > >(Note: I didn't say to *finish* the instruction, I said "make progress". >If the count were larger than a page size you could hit the same problems >at the next page boundary. But after handling potential TLB faults [which >cold cause page faults] you would eventually begin to make progress again.) > >Has anyone got a more exact reference with the details of the "worst case"? >What *is* the absolute minimum size of TLB a 370 or a 30xx needs? How big >are the actual TLBs in real machines? In the 370 architecture the TLB is loaded by "hardware", not software, so not all the entries needed to complete a (non-interruptible) instruction need be present in the TLB at one time (though this might be a requirement in some models). However, it is certainly true that all the relevant pages must be in store, and the page table entries marked "valid". And yes, the critical number is eight: a 4-byte EXECUTE instruction might cross a page boundary; its subject instruction might also do so; and it might be an SS instruction (e.g. MVC) whose source and destination operands both crossed page boundaries; and all these pages could be different (indeed, they could all be in different segments). The page table entries themselves are not accessed by virtual but by real addresses, so they are not relevant in this context. (There is a possible extra complication in SIE mode under 370/XA, when the virtual machine's page zero - real to the guest, but virtual to the host - may also need to be accessible.) As regards interruptible instructions, it is sufficient, as you say, that the instruction be able to "make progress" with eight valid page table entries. In this context, it is significant that those of the 3090 vector facility instructions that access non-contiguous vector operands have the guarantee "access exceptions are not recognized more than seven element locations beyond the current one" (SA22-7125-3 p.2-24). Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk
petolino@joe.Sun.COM (Joe Petolino) (02/02/90)
>+--------------- >| In article <35102@mips.mips.COM> mash@mips.COM (John Mashey) writes: >| > Consider the following instruction : >| > load.word x,0(x) >| > where x happens to point across a page boundary.... >| In the 370 analogy to the load.word example above, the instruction that >| had an unaligned operand crossing a page boundary with both pages faulting >| (let's make it hard :-) will get started 3 times. The first 2 times it >| will not finish because the prefetching of the operands will fail, so >| the operating system will have to restart it twice, but the third time >| it will finally complete (in a couple of cycles)... >+--------------- > >On a related issue, my understanding was (from idle conversation with >some IBM guys several years ago) that the 370 architecture needs at >least 8 elements in its TLB, and that the TLB must be *at least* 4-way >set-associative. The reason is some instruction which copies a source to >a destination while referring to a table (some version of Translate-And-Test >maybe?). Anyway, the idea was that if the instruction, the source, the >destination, and the table entry being used all span pages, then you need >at least 8 valid entries in the TLB to make progress on the instruction. > >(Note: I didn't say to *finish* the instruction, I said "make progress". >If the count were larger than a page size you could hit the same problems >at the next page boundary. But after handling potential TLB faults [which >cold cause page faults] you would eventually begin to make progress again.) > >Has anyone got a more exact reference with the details of the "worst case"? >What *is* the absolute minimum size of TLB a 370 or a 30xx needs? How big >are the actual TLBs in real machines? I don't know the exact answer to all of these questions, but I can give a counterexample to the 4-way-set-associative requirement. All of the Amdahl machines that I'm familiar with use 2-way-associative TLBs, and have no trouble maintaining 370 binary compatibility. I think those IBM guys were confusing the requirements of one particular implementation with the requirements of the architecture. As to the total TLB size, the number 256 sticks in my mind. In practice, this is probably more a function of the available RAMs than anything else. In an unrelated but interesting note, you'll notice that I said '2-way-associative', not '2-way-set-associative'. In some of these designs, the two entries being looked at are chosen by *different* functions of the virtual address. The reasoning behind this is left as an exercise for the reader :-) . -Joe
pkr@maddog.sgi.com (Phil Ronzone) (02/04/90)
In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes: >In article <9001270059.AA26776@ucbvax.Berkeley.EDU> JOSH@IBM.COM ("Josh >On a related issue, my understanding was (from idle conversation with >some IBM guys several years ago) that the 370 architecture needs at >least 8 elements in its TLB, and that the TLB must be *at least* 4-way >set-associative. The reason is some instruction which copies a source to >a destination while referring to a table (some version of Translate-And-Test >maybe?). Anyway, the idea was that if the instruction, the source, the >destination, and the table entry being used all span pages, then you need >at least 8 valid entries in the TLB to make progress on the instruction. > >(Note: I didn't say to *finish* the instruction, I said "make progress". >If the count were larger than a page size you could hit the same problems >at the next page boundary. But after handling potential TLB faults [which >cold cause page faults] you would eventually begin to make progress again.) The 370 could get by with zero TLB elements. The 370 does NOT fault on any TLB operation. The segment and page tables are chased in memory and must be in memory at all times (not paged out). The original small 370's had 8 elements, later 16, and the bigger 370's had 128. The 360/67 also had 8 in the "Blauuw box". ------Me and my dyslexic keyboard---------------------------------------------- Phil Ronzone Manager Secure UNIX pkr@sgi.COM {decwrl,sun}!sgi!pkr Silicon Graphics, Inc. "I never vote, it only encourages 'em ..." -----In honor of Minas, no spell checker was run on this posting---------------
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/05/90)
In article <1746@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes: In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes: >On a related issue, my understanding was (from idle conversation with >some IBM guys several years ago) that the 370 architecture needs at >least 8 elements in its TLB, and that the TLB must be *at least* 4-way >set-associative. The reason is some instruction which copies a source to >a destination while referring to a table (some version of Translate-And-Test >maybe?). Anyway, the idea was that if the instruction, the source, the >destination, and the table entry being used all span pages, then you need >at least 8 valid entries in the TLB to make progress on the instruction. As regards interruptible instructions, it is sufficient, as you say, that the instruction be able to "make progress" with eight valid page table entries. In this context, it is significant that those of the 3090 vector facility instructions that access non-contiguous vector operands have the guarantee "access exceptions are not recognized more than seven element locations beyond the current one" (SA22-7125-3 p.2-24). If somebody is *really* interested in the subject of restartability, unaligned accesses, and the such, I think that a read of R. Ibbet & D. Morris "The MU5 computer system" is fascinating. This was a large supercomputer of the late 60s/early 70s, and had a deep pipeline, a simple architecture, and virtual memory. All these conspired to create not simple problems, solved with some elegance (and where else can you find a discussion of the effect on architecture of the probabilistic nature of a flip-flop? :->). The discussion is terse, and I think highly relevant to many latter day sophisticated machines. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk