[comp.arch] 370 Operand Alignment and Page Faults

cet1@cl.cam.ac.uk (C.E. Thompson) (02/01/90)

In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes:
>On a related issue, my understanding was (from idle conversation with
>some IBM guys several years ago) that the 370 architecture needs at
>least 8 elements in its TLB, and that the TLB must be *at least* 4-way
>set-associative. The reason is some instruction which copies a source to
>a destination while referring to a table (some version of Translate-And-Test
>maybe?). Anyway, the idea was that if the instruction, the source, the
>destination, and the table entry being used all span pages, then you need
>at least 8 valid entries in the TLB to make progress on the instruction.
>
>(Note: I didn't say to *finish* the instruction, I said "make progress".
>If the count were larger than a page size you could hit the same problems
>at the next page boundary. But after handling potential TLB faults [which
>cold cause page faults] you would eventually begin to make progress again.)
>
>Has anyone got a more exact reference with the details of the "worst case"?
>What *is* the absolute minimum size of TLB a 370 or a 30xx needs? How big
>are the actual TLBs in real machines?

In the 370 architecture the TLB is loaded by "hardware", not software, so
not all the entries needed to complete a (non-interruptible) instruction
need be present in the TLB at one time (though this might be a requirement
in some models). However, it is certainly true that all the relevant pages
must be in store, and the page table entries marked "valid". And yes, the
critical number is eight: a 4-byte EXECUTE instruction might cross a page
boundary; its subject instruction might also do so; and it might be an SS
instruction (e.g. MVC) whose source and destination operands both crossed
page boundaries; and all these pages could be different (indeed, they could
all be in different segments). The page table entries themselves are not
accessed by virtual but by real addresses, so they are not relevant in this
context. (There is a possible extra complication in SIE mode under 370/XA,
when the virtual machine's page zero - real to the guest, but virtual to
the host - may also need to be accessible.)

As regards interruptible instructions, it is sufficient, as you say, that
the instruction be able to "make progress" with eight valid page table
entries. In this context, it is significant that those of the 3090 vector
facility instructions that access non-contiguous vector operands have the
guarantee "access exceptions are not recognized more than seven element
locations beyond the current one" (SA22-7125-3 p.2-24).

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

petolino@joe.Sun.COM (Joe Petolino) (02/02/90)

>+---------------
>| In article <35102@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>|  > Consider the following instruction :
>|  > 	load.word	x,0(x)
>|  > where x happens to point across a page boundary....
>| In the 370 analogy to the load.word example above, the instruction that
>| had an unaligned operand crossing a page boundary with both pages faulting
>| (let's make it hard :-) will get started 3 times.  The first 2 times it
>| will not finish because the prefetching of the operands will fail, so
>| the operating system will have to restart it twice, but the third time
>| it will finally complete (in a couple of cycles)...
>+---------------
>
>On a related issue, my understanding was (from idle conversation with
>some IBM guys several years ago) that the 370 architecture needs at
>least 8 elements in its TLB, and that the TLB must be *at least* 4-way
>set-associative. The reason is some instruction which copies a source to
>a destination while referring to a table (some version of Translate-And-Test
>maybe?). Anyway, the idea was that if the instruction, the source, the
>destination, and the table entry being used all span pages, then you need
>at least 8 valid entries in the TLB to make progress on the instruction.
>
>(Note: I didn't say to *finish* the instruction, I said "make progress".
>If the count were larger than a page size you could hit the same problems
>at the next page boundary. But after handling potential TLB faults [which
>cold cause page faults] you would eventually begin to make progress again.)
>
>Has anyone got a more exact reference with the details of the "worst case"?
>What *is* the absolute minimum size of TLB a 370 or a 30xx needs? How big
>are the actual TLBs in real machines?

I don't know the exact answer to all of these questions, but I can give
a counterexample to the 4-way-set-associative requirement.  All of the
Amdahl machines that I'm familiar with use 2-way-associative TLBs, and have
no trouble maintaining 370 binary compatibility.  I think those IBM guys
were confusing the requirements of one particular implementation with
the requirements of the architecture.

As to the total TLB size, the number 256 sticks in my mind.  In practice,
this is probably more a function of the available RAMs than anything else.

In an unrelated but interesting note, you'll notice that I said
'2-way-associative', not '2-way-set-associative'.  In some of these designs,
the two entries being looked at are chosen by *different* functions of the
virtual address.  The reasoning behind this is left as an exercise for the
reader :-) .

-Joe

pkr@maddog.sgi.com (Phil Ronzone) (02/04/90)

In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes:
>In article <9001270059.AA26776@ucbvax.Berkeley.EDU> JOSH@IBM.COM ("Josh
>On a related issue, my understanding was (from idle conversation with
>some IBM guys several years ago) that the 370 architecture needs at
>least 8 elements in its TLB, and that the TLB must be *at least* 4-way
>set-associative. The reason is some instruction which copies a source to
>a destination while referring to a table (some version of Translate-And-Test
>maybe?). Anyway, the idea was that if the instruction, the source, the
>destination, and the table entry being used all span pages, then you need
>at least 8 valid entries in the TLB to make progress on the instruction.
>
>(Note: I didn't say to *finish* the instruction, I said "make progress".
>If the count were larger than a page size you could hit the same problems
>at the next page boundary. But after handling potential TLB faults [which
>cold cause page faults] you would eventually begin to make progress again.)

The 370 could get by with zero TLB elements. The 370 does NOT fault on
any TLB operation. The segment and page tables are chased in memory and
must be in memory at all times (not paged out). The original small
370's had 8 elements, later 16, and the bigger 370's had 128.

The 360/67 also had 8 in the "Blauuw box".


------Me and my dyslexic keyboard----------------------------------------------
Phil Ronzone   Manager Secure UNIX           pkr@sgi.COM   {decwrl,sun}!sgi!pkr
Silicon Graphics, Inc.               "I never vote, it only encourages 'em ..."
-----In honor of Minas, no spell checker was run on this posting---------------

pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (02/05/90)

In article <1746@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes:

   In article <49365@sgi.sgi.com> rpw3@rigden.UUCP (Robert P. Warnock) writes:
   >On a related issue, my understanding was (from idle conversation with
   >some IBM guys several years ago) that the 370 architecture needs at
   >least 8 elements in its TLB, and that the TLB must be *at least* 4-way
   >set-associative. The reason is some instruction which copies a source to
   >a destination while referring to a table (some version of Translate-And-Test
   >maybe?). Anyway, the idea was that if the instruction, the source, the
   >destination, and the table entry being used all span pages, then you need
   >at least 8 valid entries in the TLB to make progress on the instruction.

   As regards interruptible instructions, it is sufficient, as you say, that
   the instruction be able to "make progress" with eight valid page table
   entries. In this context, it is significant that those of the 3090 vector
   facility instructions that access non-contiguous vector operands have the
   guarantee "access exceptions are not recognized more than seven element
   locations beyond the current one" (SA22-7125-3 p.2-24).

If somebody is *really* interested in the subject of restartability,
unaligned accesses, and the such, I think that a read of

	R. Ibbet & D. Morris
	"The MU5 computer system"

is fascinating. This was a large supercomputer of the late
60s/early 70s, and had a deep pipeline, a simple architecture,
and virtual memory. All these conspired to create not simple
problems, solved with some elegance (and where else can you find
a discussion of the effect on architecture of the probabilistic
nature of a flip-flop? :->).

The discussion is terse, and I think highly relevant to many
latter day sophisticated machines.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk