roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (08/26/89)
I've noticed that motorola has moved from instruction continuation (68010-30) to instruction restart (68040). So they no longer support virtual machines. (Must be the processors got tired of puking their insides all over the stack. :-) Quoting the 68030 manual: Instruction continuation is used to support virtual I/O devices in memory-mapped input/output systems. Control and data registers for the virtual are simulated in the memory map. An access to a virtual register causes a fault and the function of the register is emulated by software. Anybody know why this instruction discontinuation? Is virtual machine emulation a lovely idea whose time has come and gone? Or does it use too many hardware resources? I think some (all?) of the risc processors use instruction restart (mips if I remember correctly) so are we looking at the end of instruction continuation? -- I don't know what the question means, but the answer is yes... KLM - Koninklijke Luchtvaart Maatschappij => coneenclicker lughtfart matscarpie Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
tim@cayman.amd.com (Tim Olson) (08/28/89)
In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: | I've noticed that motorola has moved from instruction continuation | (68010-30) to instruction restart (68040). So they no longer support | virtual machines. (Must be the processors got tired of puking their | insides all over the stack. :-) | | I think some (all?) of the risc processors use instruction restart (mips if | I remember correctly) so are we looking at the end of instruction continuation? Well, when most (if not all) of the instructions execute in a single cycle, instruction continuation and instruction restart look pretty much the same. Especially in a load/store architecture where there are fewer instruction side-effects. The 29K uses instruction restart for all instructions except for loads and stores, which cannot be restarted (in the absolute sense) because they execute in parallel with subsequent instructions. Instead, these instructions are continued from on-chip state registers. Loadm (Load Multiple) and storem (Store Multiple) are continued from the last completed transfer if interrupted. I think you will find this mix of restart and continuation in many processors which have simple instructions and parallel functional units with out-of-order completion. -- Tim Olson Advanced Micro Devices (tim@amd.com)
shebanow@oakhill.UUCP (Mike Shebanow) (08/28/89)
In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: >I've noticed that motorola has moved from instruction continuation >(68010-30) to instruction restart (68040). So they no longer support >virtual machines. (Must be the processors got tired of puking their >insides all over the stack. :-) > >Quoting the 68030 manual: > >Instruction continuation is used to support virtual I/O devices in >memory-mapped input/output systems. Control and data registers for >the virtual are simulated in the memory map. An access to a virtual >register causes a fault and the function of the register is emulated >by software. You can still emulate virtual machines using instruction restart. All you have to do is simply interpret the instruction which faulted :-\ That is, when the machine takes the exception, the stack frame will point to the offending instruction. At that point, software can interpret this instruction and perform the intended operation. The only change is that software has to do all the work, not just part of it. Mike Shebanow ------------------------ Disclaimer: The opinions I have presented here are my own, not Motorola's.
scott@bbxeng.UUCP (Engineering) (08/28/89)
In article <2345@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes: In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: >I've noticed that motorola has moved from instruction continuation >(68010-30) to instruction restart (68040). So they no longer support >virtual machines. (Must be the processors got tired of puking their >insides all over the stack. :-) > >[...] You can still emulate virtual machines using instruction restart. All you have to do is simply interpret the instruction which faulted :-\ That is, when the machine takes the exception, the stack frame will point to the offending instruction. Forgive me for showing my ignorance, but, doesn't instruction continuation enable features such as dynamic stack allocation? Are we doomed to return to the antiquated "stack probe"? Does this mean that 68030 (user mode) software will not always work correctly on the 68040? What about page faults? Is the operating system *really* expected to include an instruction set interpreter so it can simulate instruction continuation? The 386 is suddenly starting to look good me. -- --------------------------------------- Scott Amspoker Basis International, Albuquerque, NM 505-345-5232
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (08/29/89)
In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes: >Forgive me for showing my ignorance, but, doesn't instruction continuation >enable features such as dynamic stack allocation? Are we doomed to >return to the antiquated "stack probe"? Does this mean that 68030 >(user mode) software will not always work correctly on the 68040? >What about page faults? Is the operating system *really* expected >to include an instruction set interpreter so it can simulate >instruction continuation? Well, no. Perhaps you are confusing "instruction continuation" with "program continuation". A normal interrupt can be ignored for a tiny amount of time. So, for convenience, the processor will ignore an interrupt request until the processor happens to be between instructions. A page fault interrupt isn't like that. The instruction in progress cannot go forward: it wants to write to a page that is out on disk (or whatever). The interrupt has to be honored at once, and the instruction is not completed. The operating system is invoked. The OS does good stuff (like disk I/O) and eventually decides to let the user program resume. But resume where in the program? And with what register contents, what processor state? If the hardware has been designed to do "instruction continuation", then the user program will resume somewhere in the middle of the offending instruction. If the hardware has been designed for "instruction restart", then the program will be resumed at the start of the offending instruction. The user-visible result is the same in both cases. The fun stuff comes in actually **implementing** either of these schemes. For example, suppose the following instruction: load two words from @ro into r0 and r1. What if the two words lie across a page boundary? Hmmm! -- Don D.C.Lindsay Carnegie Mellon School of Computer Science
scott@bbxeng.UUCP (Engineering) (08/29/89)
In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: > >If the hardware has been designed to do "instruction continuation", >then the user program will resume somewhere in the middle of the >offending instruction. If the hardware has been designed for >"instruction restart", then the program will be resumed at the start >of the offending instruction. The user-visible result is the same in >both cases. > I guess this is where I'm having a problem. What if the instruction involved address increment/decrement modes? Restarting the instruction might not give the exact same result unless the results of auto-inc/dec were not placed into the affected registers until the instruction completes. I remember reading some literature when the 68010 came out explaining the wonderful benefits of instruction continuation and why instruction restart did not always solve the problem. (I don't remember *where* I read this.) Now I'm hearing that it doesn't really matter. Instruction restart makes a lot more sense to me as long as the side effects of the instruction are not not interruptable. Is this the case with the 68040? -- --------------------------------------- Scott Amspoker Basis International, Albuquerque, NM 505-345-5232
mash@mips.COM (John Mashey) (08/29/89)
In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes: >>Forgive me for showing my ignorance, but, doesn't instruction continuation >>enable features such as dynamic stack allocation?...... ..... >Well, no. Perhaps you are confusing "instruction continuation" with >"program continuation". ..... >If the hardware has been designed to do "instruction continuation", >then the user program will resume somewhere in the middle of the >offending instruction. If the hardware has been designed for >"instruction restart", then the program will be resumed at the start >of the offending instruction. The user-visible result is the same in >both cases. > >The fun stuff comes in actually **implementing** either of these >schemes. For example, suppose the following instruction: > > load two words from @ro into r0 and r1. I think this last must have meant @r0 into r0 and r1. All of this is why most RISC machines: a) Use load/store architectures, with zero (or very few) side-effects. b) Generally require loads/stores to access aligned data objects, or (more generally), at least forbid any kind of load/store from crossing a boundary. c) Usually do instruction-retart, or something close. Note that restart-vs-continue is not a binary decision. Some CPUs that mostly do restart may have some flavor of continuation in certain cases, i.e., with imprecise exceptions, and sometimes with branch-delay-slot things, or with emulation of missing FPUs, etc, etc. The more fundamental issue is how much state does it take to figure out where you were and get back there. At the minimum, this is just a PC. At worst, the processor dumps at lot of mysterious stuff somewhere. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
dswartz@bbn.com (Dan Swartzendruber) (08/29/89)
Clearly it has to make some difference, given that the 680x0 processors support auto-increment/decrement of address registers! The PDP-11 had the same problem. I seem to recall they solved it by having a diagnostic register in which the CPU wrote which registers had been incremented or decremented and by how much. That wasn't as bad as it might first seem. There are only two registers which can change as a result of any given instruction and they could only change by 1, 2 or 4. It's been a while since I hacked on a PDP-11, so I might be off a little here, but that was the basic gist.... Maybe the 68040 does something similar? It certainly can't be any uglier than the sh*t the current processors have with eight gazillion different types of fault frames which can change incompatibly as the microcode is updated....
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (08/29/89)
In article <205@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes: >I guess this is where I'm having a problem. What if the instruction >involved address increment/decrement modes? Restarting the instruction >might not give the exact same result unless the results of auto-inc/dec >were not placed into the affected registers until the instruction completes. Sorry, I should have stated explicitly the strong requirement that is placed on the OS: The user program must not be able to notice that anything happened (except in second-order ways, such as the time of day jumping ahead). Notice I said the OS. That's because the hardware doesn't necessarily do it all. It is fine if the processor merely leaves enough information so that the OS can sort things out. For instance, on the old PDP-11/45, there were special registers, which recorded whether the interrupted instruction had autoincremented any registers, and if so, which ones. First, the OS would copy the user's register set to some convenient place in memory. Then, using this special information, the OS would undo any incrementation. Later, the values in memory would be reloaded, and the offending instruction would be restarted. The instruction would "see" the same values that it saw the previous time. Alternatively, the hardware can have "shadow registers". At the beginning of each instruction, they are made equal to the normal registers. If the instruction faults, then simply store the shadow to memory, instead of storing the "foreground" register set to memory. This nicely avoids the undo problem, and replaces it with lots and lots of silicon. Or, as you suggested, the register updates can be postponed until the end of the instruction. On most machines this would be simpler but slower. -- Don D.C.Lindsay Carnegie Mellon School of Computer Science
shebanow@oakhill.UUCP (Mike Shebanow) (08/29/89)
In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes: > > In article <2345@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes: > In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl > (R. Vuurboom) writes: > > >I've noticed that motorola has moved from instruction continuation > >(68010-30) to instruction restart (68040). So they no longer support > >virtual machines. (Must be the processors got tired of puking their > >insides all over the stack. :-) > > > >[...] > > You can still emulate virtual machines using instruction restart. All > you have to do is simply interpret the instruction which faulted :-\ > That is, when the machine takes the exception, the stack frame will point > to the offending instruction. > >Forgive me for showing my ignorance, but, doesn't instruction continuation >enable features such as dynamic stack allocation? Are we doomed to >return to the antiquated "stack probe"? Does this mean that 68030 >(user mode) software will not always work correctly on the 68040? >What about page faults? Is the operating system *really* expected >to include an instruction set interpreter so it can simulate >instruction continuation? The 386 is suddenly starting to look good >me. > >--------------------------------------- >Scott Amspoker >Basis International, Albuquerque, NM >505-345-5232 Sorry about the long reply. I believe (but I am willing to be proved wrong) that anything that can be done using instruction continuation can also be done using restart. This includes dynamic stack allocation. Using that as an example, when a stack overrun occurs (decrements below allocated memory), a page fault will occur. In the restart model, the offending instruction will be undone. In general, most machines using restart will store exception cause information in supervisor visible registers. This will indicate why the exception occurred (MMU fault), what happened (translation not valid - page fault), where it happened (some virtual address - In Unix for example, a stack fault would be obvious by inspection of the address) and other pertinent information. Once the operating has determined the cause, it can allocate new memory and simply restart the instruction. Assuming that there are no other problems with the offending instruction, all will proceed as normal. In response to your question about interpreters in operating systems, no, I don't think an operating using a restart machine needs to have a built-in interpreter. Page faults, for example, would be handled in a manner similar to the way stack faults are handled: the fault gets logged in hardware registers and restart is used to reexecute the faulting instruction once the page has been swapped in. The OS doesn't need any more detailed information than that. So when is an interpreter necessary? If a complete virtual machine is to be emulated, and such a machine includes such things as virtual memory mapped I/O devices, then interpretation may be necessary. For example, assume that some type of I/O device is mapped into user memory. The OS wants the user to be able to read the device normally, but if a write is attempted, some other action should happen. BUT, the OS wants the user program to think that it HAS written the device. In certain machines (which support memory-memory operations), a read may have happened to the device and before the write part of the instruction attempts to write the device. If the reading the device is destructive (meaning you can only read it once), you cannot use restart on the instruction. If you did, you would read the device twice. In this particular case, it might be necessary for the OS to complete the instruction on behalf of the hardware. The only real difference between a restart machine and a continuation machine is (a) how much work needs to be done by hardware to save enough state so that instruction restart or continuation is possible and (b) how much work needs to be redone once the problem is corrected. Mike Shebanow -------------- Disclaimer: the opinions presented here are my own, not Motorola's.
paul@taniwha.UUCP (Paul Campbell) (08/29/89)
In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes: > >Forgive me for showing my ignorance, but, doesn't instruction continuation >enable features such as dynamic stack allocation? Are we doomed to >return to the antiquated "stack probe"? Does this mean that 68030 >(user mode) software will not always work correctly on the 68040? >What about page faults? Is the operating system *really* expected >to include an instruction set interpreter so it can simulate >instruction continuation? The 386 is suddenly starting to look good >me. No - the need for 'stack probe' etc was caused by the 68000 (fixed in the 68010) which couldn't restart (either restart or continue) from a bus error (page fault) [it didn't keep enough information around in its bus error stack frame to tell what it had done, or undo it itself while delivering the bus error]. Some early vendors actually had 2 68000s, one which executed the user mode code and was halted in mid instruction while the other was started to fix the problem ..... thus preserving the internal state .... I'm sure Motorola won't make this mistake again ..... Paul -- Paul Campbell UUCP: ..!mtxinu!taniwha!paul AppleLink: D3213 "Free Market": n. (colloq.) a primitive fertility goddess worshipped by an obscure cult in the late 20th C. It's chief priest 'Dow Jones' was eventually lynched by an enraged populace during an economic downturn (early 21st C).
scott@bbxeng.UUCP (Engineering) (08/29/89)
In article <5995@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
Sorry, I should have stated explicitly the strong requirement that is
placed on the OS:
The user program must not be able to notice that anything
happened (except in second-order ways, such as the time of
day jumping ahead).
Just as long as we don't have to go back to stack probes.
--
---------------------------------------
Scott Amspoker
Basis International, Albuquerque, NM
505-345-5232
rajivp@sunshade.Sun.COM (Rajiv Patel) (08/30/89)
In article <26418@winchester.mips.COM> mash@mips.COM (John Mashey) writes: > The more fundamental issue is how much state does it take ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > to figure out where you were and get back there. At the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > minimum, this is just a PC. At worst, the processor dumps at > lot of mysterious stuff somewhere. >-- >-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> To further the discussion on this issue .... There has been lots of discussion about superscalar and VLIW architectures. How about instruction continuation and/or restart for these architectures. It seems that a N way super-scalar machine would need N PC's and a lot of state information. For VLIW machines it seems that instruction continuation might be better for restart might first need to roll back partially completed operations. Rajiv Patel.
srg@quick.COM (Spencer Garrett) (08/30/89)
Instruction restart on a CISC can cause grevious problems when there are side effects to reading or writing certain addresses. Consider, for instance, the following instruction which reads a device register and stores it into memory. (This is 68xxx code.) movb a0@(2),_memloc This instruction could fault after the read of the device register if either _memloc or the tail end of the instruction isn't resident. Many devices will clear interrupt bits when you read their status register or shift in the next byte when you read their data register, so repeating the entire instruction doesn't give equivalent results. One could code this (in C) as register char temp; temp = dev->reg; memloc = temp; and hope the compiler doesn't optimize too agressively, or maybe (someday) use "volatile" to indicate to the compiler that dev->reg is special, but I wouldn't bet any current compilers take instruction restart into account when generating code for volatile data fetches. For the moment this isn't a major problem, since Unix device drivers usually run in physical memory (so any fault that occurs is fatal, and won't be rerun), but I long for the day when this isn't necessarily so.
shebanow@oakhill.UUCP (Mike Shebanow) (08/30/89)
In article <123909@sun.Eng.Sun.COM> rajivp@sun.UUCP (Rajiv Patel) writes: > state information. For VLIW machines it seems that instruction continuation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > might be better for restart might first need to roll back partially completed ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > operations. ^^^^^^^^^^^ > This depends on how often you expect roll back to be necessary. From the perspective of a single instruction, all the data I have ever seen indicates that exceptions occur very infrequently. (on most machines, the most frequent exception is a page fault). So, is this really an issue? In addition, what fraction of time does the lost work account for in comparison to the amount of time spent by the OS just trying to figure out what the exception was and how to deal with it? Mike Shebanow PS: I do not consider OS calls (traps) or interrupts to be an exception -- the hardware can plan for those and hence not require any roll back. ------------- Disclaimer: the opinions expressed here are my own, not Motorola's.
GPWRDCS@gp.govt.nz (Don Stokes, GPO) (09/02/89)
In article <205@bbxeng.UUCP>, scott@bbxeng.UUCP (Engineering) writes: > In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >> >>If the hardware has been designed to do "instruction continuation", >>then the user program will resume somewhere in the middle of the >>offending instruction. If the hardware has been designed for >>"instruction restart", then the program will be resumed at the start >>of the offending instruction. The user-visible result is the same in >>both cases. >> > I guess this is where I'm having a problem. What if the instruction > involved address increment/decrement modes? Restarting the instruction > might not give the exact same result unless the results of auto-inc/dec > were not placed into the affected registers until the instruction completes. > > I remember reading some literature when the 68010 came out explaining the > wonderful benefits of instruction continuation and why instruction restart > did not always solve the problem. (I don't remember *where* I read this.) > Now I'm hearing that it doesn't really matter. Instruction restart makes > a lot more sense to me as long as the side effects of the instruction are > not not interruptable. Is this the case with the 68040? > I think I saw something similar in the depths of my MC68020 User's Guide (I love the way Motorola call the technical docs for a processor a "User's Guide"). Might dig it out sometime, but I think the gist of it was as follows: If you execute an instruction: MOVE $300, $1000 and location $1000 was not in physical memory, the OS would have to bring the page into physical memory. If the instruction continues after the pagefault completes, all is fine. If the instruction restarts, it has to access location $300 again. However, it is possible that in a tight memory situation, the act of paging in $1000 could page $300 out, and vice-versa. While an instruction continuation wouldn't mind, an instruction restart would result in an infinite loop paging $1000 in and $300 out, restarting, paging $300 in and $1000 out and restarting. While that case could probably be coded around for a simple two or three operand case (by ensuring that the last two or three pages accessed by a process remain in memory, or perhaps something cleverer?), things get somewhat messier when a block move instruction is executed, such as the VAX instruction MOVC3, which can move up to 64KB in one go, which could cross a lot of pages (of course MOVC3 is continuable on the VAX...). This wouldn't be a problem on the 680x0, as these processors (unless they've added block moves to the 68030 and/or 68040) do the grunt work of block moves in two instructions, eg: MOVE.L #scraddr, A0 MOVE.L #dstaddr, A1 MOVE.L #len, D0 loop: MOVE.B (A0)+, (A1) DJNZ D0, loop (which of course explains why the 68010 (and '012) has a two instruction cache). Don Stokes, Systems Programmer / / Domain: don@gp.govt.nz Government Printing Office, /GP/ PSImail: PSI%0530147000028::DON Wellington, New Zealand / / Bang: ...!uunet!vuwcomp!windy!gpwd!don -------------------------------------------------------------------------------- When the going gets tough, upgrade.
bruce@blender.UUCP (Bruce Thompson) (09/02/89)
In article <5990@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: /* discussion about interrupt requests deleted */ > A page fault interrupt isn't like that. The instruction in progress > cannot go forward: it wants to write to a page that is out on disk > (or whatever). The interrupt has to be honored at once, and the > instruction is not completed. The operating system is invoked. The OS > does good stuff (like disk I/O) and eventually decides to let the > user program resume. But resume where in the program? And with what > register contents, what processor state? > > If the hardware has been designed to do "instruction continuation", > then the user program will resume somewhere in the middle of the > offending instruction. If the hardware has been designed for > "instruction restart", then the program will be resumed at the start > of the offending instruction. The user-visible result is the same in > both cases. > Forgive me if I am re-hashing stuff which has been covered before. Intruction continuation helps to prevent a condition known as `thrashing' due to page-faults. An example: A machine executes an instruction like the following: mov.l label1, label2 Assume that the instruction, the source and the destination reside on different pages in VM. The worst case occurs when both data pages have been swapped out to disk. The sequence of operations for an `instruction re-start' machine can occur like this: 1. fetch the instruction 2. attempt to fetch the source operand 3. page fault occurs. Block the process waiting for the page. In most cases, another process is continued until the page has been swapped in. 4. Re-start the instruction. While waiting for the new page, the page with the instruction has been swapped out. This causes another page fault. 5. Block the process while the text page is swapped in. Other processes run while waiting. 6. Re-start the instruction. 7. goto step 2. This sequence can occur for an extremely long period, poossibly locking up the entire machine, certainly severly degrading performance, particularly where physical memory is in short supply. Where a processor can continue instructions rather than re-start them the worst-case sequence is: 1. fetch the instruction 2. attempt to fetch the source operand 3. page-fault occurs. Block the process waiting for the new page. Allow other processes to run. 4. continue instruction 5. fetch the source operand 6. attempt to store opreand 7. page-fault occurs. Block the process until the new page is fetched. Again, other processes are allowed to run. 8. continue instruction 9. store the operand This demonstrates that on a continuable instruction machine (MC68020 etc.), the paging overhead can be noticably reduced as compared with a re-start machine. From the user's point of view, there will be less paging activity and better performance on a `continuable' machine.
dswartz@bbn.com (Dan Swartzendruber) (09/03/89)
I think this is somewhat of a misleading point. Ignoring the block move case, a generic two-operand instruction on most any machine could take up to six page faults (the instruction spans a page boundary, and each of the operands does as well.) Although it is possible to construct a theoretical scenario where an instruction restart CPU could get into an infinite page-fault loop, I will respectfully suggest that if your system only has 6 free pages of memory, the effective result (as far as getting any useful work done is concerned) will be pretty much the same! Not to mention that this scenario assumes almost complete brain-damage on the parts of: the compiler, the linker, the user and the sys admin...
blarson@basil.usc.edu (bob larson) (09/04/89)
In article <2353@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes: >I believe (but I am willing to be proved wrong) that anything that can >be done using instruction continuation can also be done using restart. Since instruction continuation requires the ability to save and restore internal state information, this could be (ab)used in various ways. The only reasonable one I can think of is for chip testing. (For both design and manufacuring defects.) See the periodicly repeating discussion of testabilty on sci.electrionics. -- Bob Larson Arpa: blarson@basil.usc.edu Uucp: {uunet,cit-vax}!usc!basil!blarson Prime mailing list: info-prime-request%ais1@usc.edu usc!ais1!info-prime-request
seanf@sco.COM (Sean Fagan) (09/04/89)
In article <45180@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes: >Although it is possible to construct a >theoretical scenario where an instruction restart CPU could get into an >infinite page-fault loop, I will respectfully suggest that if your system >only has 6 free pages of memory, the effective result (as far as getting >any useful work done is concerned) will be pretty much the same! Somebody else also gave the example: movw $300, $100 (obviously a VAX 8-)). Dan is, I think, following up to the comment that, if you're tight on free memory, swapping the page with $100 could cause the page with $300 to be swapped out (and vice-versa), which would cause real problems with instruction restart. Now, my point: how about shared memory? (SysV-type shared memory, not multi-processor-type shared memory.) With instruction restart, the value in $300 could have changed, while, with instruction continuation, it doesn't matter. How do various OS's and hardwares handle it? -- Sean Eric Fagan | "Time has little to do with infinity and jelly donuts." seanf@sco.UUCP | -- Thomas Magnum (Tom Selleck), _Magnum, P.I._ (408) 458-1422 | Any opinions expressed are my own, not my employers'.
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/04/89)
In article <265@gp.govt.nz> GPWRDCS@gp.govt.nz (Don Stokes, GPO) writes: >In article <205@bbxeng.UUCP>, scott@bbxeng.UUCP (Engineering) writes: >> >> I remember reading some literature when the 68010 came out explaining the >> wonderful benefits of instruction continuation and why instruction restart >> did not always solve the problem. (I don't remember *where* I read this.) >> Now I'm hearing that it doesn't really matter. Instruction restart makes >> a lot more sense to me as long as the side effects of the instruction are >> not not interruptable. Is this the case with the 68040? To begin with the last point first, its my understanding that due to data prefetching and the restart exception model the same location can sometimes be accessed twice (which can be, uhmmm, unpleasant for memory-mapped i/o) which is why motorola no longer claims virtual machine support for the 68040. My understanding could be flawed though... But maybe we can second-guess motorola. How about the following scenario: In order to improve perfomance, motorola decides to pipeline heavily, heavy pipeline means a lot of data prefetching. Now with all that prefetched data we suddenly run into an exception...problems. Best thing is throw it all away and start anew. Now the whole point of instruction continuation was to know which locations were already accessed. Since this could no longer be supported might as well go over to the simpler instruction restart. So the real trade-off was performance vs virtual machine support. Could this be the story behind the instruction continuation discontinuation? Stay tuned :-) As for the first point: The 68010 programmers reference manual motivates instruction continuation as follows (1.4.1 Virtual Memory p.8): "The MC68010 uses instruction continuation rather than instruction restart to support virtual memory. With instruction restart, the processor must remember the exact state of the system before each instruction is started in order to restore that state if a page is fault occurs during its execution. Then, after the page fault has been repaired, the entire instruction that caused the fault is reexecuted. With instruction continuation, when a page fault occurs the processor stores its internal state and then after the page fault is repaired, restores that internal state and continues execution of the instruction. In order for the mc68010 to utilize instruction continuation, it stores its internal state on the supervisor stack when a bus cycle is terminated with a bus error signal...... Instruction continuation has the additional advantage of allowing hardware support for virtual i/o devices. Since virtual registers may be simulated in the memory map, an access to such a register will cause a fault and the function of the register can be emulated by software." Now, especially in the light of preceding discussion in this group, it is not clear to me how the first paragraph above motivates instruction continuation above restart. Apparently the motorola folks seem to agree since in the 68020 Users Manual we read the following (1.3.1 Virtual memory p.1-7): "The MC68020 uses instruction continuation to support virtual memory. In order for the mc68020 to use instruction continuation, it stores its internal state... ...Instruction continuation is crucial to the support of virtual i/o devices in memory-mapped input/output systems. Since virtual registers may be simulated in the memory map, an access to such a register will cause a fault and the function of the register can be emulated in software." Note the two differences: first, no more mention of instruction restart and second, what was first an "additional advantage" has now become "crucial". For the 68030 we see again a changed viewpoint: In the 68030 users manual we read (in 1.6.1 Virtual Memory p.1-11): "The mc68030 uses instruction continuation to support virtual memory..." and in 1.6.2 Virtual Machine: "Instruction continuation is used to support i/o devices in memory-mapped input/output systems. Control and data registers for the virtual device are simulated in the memory map. An access to a virtual register causes a fault and the function of the register is emualated by software." Note the differences: first, instruction continuation is no longer "crucial" to support memory mapped i/o devices and second, the motorola folks have finally figured out that memory-mapping i/o devices is a virtual machine concept and not a virtual memory concept. And finally of course (what started this whole thread): the 68040's non-support of instruction continuation. > >I think I saw something similar in the depths of my MC68020 User's Guide >(I love the way Motorola call the technical docs for a processor a >"User's Guide"). Might dig it out sometime, but I think the gist of it >was as follows: > [ Example follows] You might want to dig it out sometime since I (for one) couldn't find it. -- wiskunde: Dutch for mathematics. Literally: Knowledge of certainty wis: certainty kunde: Knowledge Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
deraadt@enme3.ucalgary.ca (Theo Deraadt) (09/05/89)
In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: >Now, my point: how about shared memory? (SysV-type shared memory, not >multi-processor-type shared memory.) With instruction restart, the value in >$300 could have changed, while, with instruction continuation, it doesn't >matter. How do various OS's and hardwares handle it? Gad. How about a hardware FIFO? How about a serial receive buffer on your generic serial chip? I really doubt this rumour, unless some special trick (like maybe modifying the actual instruction in the code cache before the restart to skip the allready done part, yes, sorry, sick idea) was to be done, it just has too many differences from the current 030 and 020/851 setup to be possible. <tdr. Theo de Raadt (403) 289-5894 Calgary, Alberta, Canada
johnz@grapevine.uucp (John Zolnowsky ext. 33230) (09/06/89)
In article <241@ssp1.idca.tds.philips.nl>, roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: > But maybe we can second-guess motorola. How about the following scenario: > > In order to improve perfomance, motorola decides to pipeline heavily, > heavy pipeline means a lot of data prefetching. Now with all that > prefetched data we suddenly run into an exception...problems. Best thing > is throw it all away and start anew. Now the whole point of instruction > continuation was to know which locations were already accessed. Since this > could no longer be supported might as well go over to the simpler > instruction restart. So the real trade-off was performance vs virtual > machine support. The 68000 was designed with three stages of prefetch, all controlled by microcode. The microcode was free to manage the prefetch, external bus, and internal data operations in any order. The actual order was determined by the microcode, optimizing for performance. The values of user visible registers could be invalid, while current values were held in temporary registers. Although virtual memory was desired for the 68000, it was deemed too costly to provide an instruct restart model. This required many more temporary registers and data paths to capture user visible values at the instruction dispatch, and to restore them at a bus error. The option of restricting the microcode usage of temporaries and control of the prefetch would have impaired the performance of the processor. After the 68000 went to market, the instruction continuation model was conceived. This is best understood as an interrupt at the microcode level. The "stack dump" is a context switch, and the RTE which does the stack restore is a context switch back. This model required only a few new registers, and only one new data path. This same data path formed the basis for the 68010 loop mode. Presumably, in later processors from the family, the provision of extra hardware to reduce instruction cycle counts also leads to a reduction in the indeterminancy of the values of registers. This makes the cost of instruction restart more tractable. -John Zolnowsky ...!sun!johnz or johnz@sun.com
philhowr@unix.cie.rpi.edu (Bob Philhower) (09/06/89)
In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: >In article <45180@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes: > >Now, my point: how about shared memory? (SysV-type shared memory, not >multi-processor-type shared memory.) With instruction restart, the value in >$300 could have changed, while, with instruction continuation, it doesn't >matter. How do various OS's and hardwares handle it? I contend that the possibility of the source location being changed during a page fault on the destination is a non-issue. If there had been no page fault, the "old" value would have been written. Designers who need to worry about this possibility should really be thinking about some sort of semephore to prevent writing during a read. Robert Philhower (philhowr@unix.cie.rpi.edu) Rensselaer Center for Integrated Electronics CII 6111 / Rensselaer Polytechnic Institute / Troy, NY 12180 / USA
dennis@masscomp.UUCP (Dennis Rockwell) (09/06/89)
In article <44908@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes: > [ ... ] The PDP-11 had >the same problem. I seem to recall they solved it by having a diagnostic >register in which the CPU wrote which registers had been incremented or >decremented and by how much. That wasn't as bad as it might first seem. >There are only two registers which can change as a result of any given >instruction and they could only change by 1, 2 or 4. It's been a while >since I hacked on a PDP-11, so I might be off a little here, but that >was the basic gist.... Some PDP-11s had this register, some did not. It turns out that the only time this was a problem was when an auto-[in|de]crement *floating*point* instruction caused the fault. Unfortunately, DEC left this register out of the PDP-11/60 (or was that the 11/44?), which implemented the standard floating point instruction set. Thus, for this PDP-11 only, you had to do stack probes if you were going to use *(double *)p++ into automatic storage. Dennis Rockwell Concurrent Engineering Westford MA
firth@sei.cmu.edu (Robert Firth) (09/06/89)
In article <44908@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes: > [ ... ] The PDP-11 had >the same problem. I seem to recall they solved it by having a diagnostic >register in which the CPU wrote which registers had been incremented or >decremented and by how much. In article <2812@masscomp.UUCP> dennis@westford.ccur.com (Dennis Rockwell) writes: >Some PDP-11s had this register, some did not. It turns out >that the only time this was a problem was when an >auto-[in|de]crement *floating*point* instruction caused the >fault. Unfortunately, DEC left this register out of the >PDP-11/60 (or was that the 11/44?), which implemented the >standard floating point instruction set. Thus, for this The handbooks tell me that this register was implemented on all but one of the memory-managed PDP-11s. On the old PDP-11/45, it was called Segment Status Register #1 (SSR1) and had the format: Bits 11..15 : amount changed Bits 8..10 : register changed Bits 3.. 7 : amount changed Bits 0.. 2 : register changed On the later PDP-11s (11/44, 11/70) it was called Memory Management Register #1 (MMR1). The basic reason for doing it this way was to allow the fault handling code to undo the side effects that might have occurred. At most two registers could have been changed, and at most by 8. Note that the register didn't tell you which register SET was currently in force: you had to work that out for yourself using the various mode bits scattered about the place. You then restored the registers, reset the PC to point to the start of the instruction (this value was squirrelled away in MMR2 since you can't decode PDP-11 instructions backwards), and off you went again. If I recall aright, the fix on the PDP-11/24 was to keep the stack double-word aligned. A floating-point operation could then never cause a memory-management abort halfway through.
henry@utzoo.uucp (Henry Spencer) (09/07/89)
In article <4008@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes: >>... I seem to recall they solved it by having a diagnostic >>register in which the CPU wrote which registers had been incremented or >>decremented and by how much. > >>Some PDP-11s had this register, some did not... > >The handbooks tell me that this register was implemented on all but one >of the memory-managed PDP-11s... Unfortunately, not so: your handbooks probably are not complete. The register appeared on the 45, the first memory-managed 11. It was left out on the 40, the second. The 40's MMU was a cut-down version of the rather kitchen-sink 45 design, since the 40 was a lower-cost machine, but unfortunately they left out a couple of important things because no DEC software of the time used them. (The changed-registers register was one, split-space was the other.) The larger memory-managed 11s followed the 45; the smaller ones followed the 40. The 40, 34, 60, 23, and 24, at least, had the brain-damaged MMU. The 50, 55, and 70 had the 45 MMU, but that was no big trick since they were all 45s with changes in memory subsystem details. The 44 had a *slightly* simplified 45 MMU that got rid of some of the silliness but left everything important in. I think the more recent 11s have mostly followed the 44, but I haven't been keeping track. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/12/89)
In article <34228@grapevine.uucp> johnz@grapevine.uucp (John Zolnowsky ext. 33230) writes: > >Although virtual memory was desired for the 68000, it was deemed too costly >to provide an instruct restart model. This required many more temporary > >After the 68000 went to market, the instruction continuation model was >conceived. This is best understood as an interrupt at the microcode level. > >Presumably, in later processors from the family, the provision of extra >hardware to reduce instruction cycle counts also leads to a reduction >in the indeterminancy of the values of registers. This makes the cost >of instruction restart more tractable. > So what you're saying is that motorola first thought that virtual memory support could only be supported through instruction restart later on they conceived the concept of instruction continuation. The point is still this: why go to instruction restart seeing that (1) you _can't_ provide virtual (memory-mapped) i/o with instruction restart (this is the virtual machine part that can't be supported) (2) you can have big problems with memory-mapped i/o based on reads Instruction restart appears to be _less_ powerful than instruction continuation. Just look at the algorithm recently defined here to prevent instruction restart from doing multiple read accesses. -- wiskunde: Dutch for mathematics. Literally: Knowledge of certainty wis: certainty kunde: Knowledge Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/12/89)
In article <1790@cs-spool.calgary.UUCP> deraadt@enme3.UUCP (Theo Deraadt) writes: |In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: |>Now, my point: how about shared memory? (SysV-type shared memory, not |>multi-processor-type shared memory.) With instruction restart, the value in |>$300 could have changed, while, with instruction continuation, it doesn't |>matter. How do various OS's and hardwares handle it? |Gad. How about a hardware FIFO? How about a serial receive buffer on |your generic serial chip? I really doubt this rumour, unless some special |trick (like maybe modifying the actual instruction in the code cache |before the restart to skip the allready done part, yes, sorry, sick idea) |was to be done, it just has too many differences from the current 030 |and 020/851 setup to be possible. | <tdr. > I'm pretty sure the 68040 will use instruction restart and yes, some read accesses can occur more than once because of the instruction restart model. What the solution is for your hardware fifo is something I'm still trying to figure out :-) I don't see the shared memory as a real problem since if you had accessed that location just an instant later the value would have changed anyway. I doubt that any application would depend on such microsecond timings. -- wiskunde: Dutch for mathematics. Literally: Knowledge of certainty wis: certainty kunde: Knowledge Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
jonah@db.toronto.edu (Jeffrey Lee) (09/12/89)
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: >In article <1790@cs-spool.calgary.UUCP> deraadt@enme3.UUCP (Theo Deraadt) writes: >|In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: >|Gad. How about a hardware FIFO? How about a serial receive buffer on >|your generic serial chip? I really doubt this rumour, unless some special > What the solution is for your hardware fifo is something I'm >still trying to figure out :-) Most of the problems arise from the use of memory-memory instructions. You SHOULD be OK if you write code that accesses ``critical'' locations using only a RISC sub-set of the CISC instruction set. That is if you use register-register operations and single memory address move instructions. That is, convert: move fifo,addr1 add fifo,addr2 into: move fifo,reg move reg,addr1 move fifo,reg1 move addr2,reg2 add reg1,reg2 move reg2,addr2 You can get away with: move fifo,reg move reg,addr1 move fifo,reg add reg,addr2 if addr2 is not a critical variable or the add instruction uses guaranteed atomic read-modify-write access. A respectable processor should NOT restart a move instruction with just one memory operand if the read/write has succeeded. Therefore an an exception should not be able to cause the processor to restart the instruction and re-read the data. [No bets if the processor has a deep pipeline--try checking the hardware reference manuals or calling the manufacturer.] j.
BEAR@S34.Prime.COM (09/12/89)
It would appear that what this thread is *really* discussing is whether or not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it should be avoided (use I/O instructions instead). If it can't be avoided, be careful. It may of course be that a particular machine has no special I/O instructions (e.g. Acme RISC :-)), in which case you should "do the right thing" (most likely a load or store). Bob Beckwith Prime Computer, Inc. (508)879-2960 x 4209 bear@s34.prime.com
baum@Apple.COM (Allen J. Baum) (09/14/89)
[] >In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: > >It would appear that what this thread is *really* discussing is whether or >not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it >should be avoided (use I/O instructions instead). If it can't be avoided, >be careful. OK, I'll bite. What are the characteristics of memory mapped I/O that enable it to avoid the problems we are talking about? Note that I am assuming that memory mapped I/O is done with simple Load/Store instructions otherwise, and not hairy mem-mem translate&test&stand-on-your-head instructions. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
les@unicads.UUCP (Les Milash) (09/14/89)
In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes: >>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: >>*really* discussing is whether or >>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it >>should be avoided >OK, I'll bite. What are the characteristics of memory mapped I/O that enable >it to avoid the problems we are talking about? i'll summarize (and i'm sure y'all will correct me if i'm wrong:-) memory mapped i/o has to be "memory-like" since processors will often assume that stuff in the "memory space" is memory-like, even to the point of calling its "memory space" virtual and translating to physical. memory-like devices have the property that if you inquire their value multiple times all you get is their value (multiple times). restart is not a problem. fifos are not memory-like; reading them causes all kinds of side effects in them including that their value gets forgotten you forgot it (specifically they are channel-like (in CSP vernacular)) another lesson recently learned from this newsgroup is that i/o devices that you can write but not read (like cmd registers to some XXX controller chip) also are a pain in the ass (but not in the virtual ass; it's the driver writer's ass that gets bit). right? Les Milash
baum@Apple.COM (Allen J. Baum) (09/15/89)
[] >In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: > >It would appear that what this thread is *really* discussing is whether or >not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it >should be avoided (use I/O instructions instead). If it can't be avoided, >be careful. OK, I'll bite. What are the characteristics of memory mapped I/O that enable it to avoid the problems we are talking about? Note that I am assuming that memory mapped I/O is done with simple Load/Store instructions otherwise, and not hairy mem-mem translate&test&stand-on-your-head instructions. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
baum@Apple.COM (Allen J. Baum) (09/15/89)
[] >In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: >We are *really* discussing is whether or not MEMORY MAPPED I/O is a good thing So, I said "OK, I'll bite. What are the characteristics of memory mapped I/O that enable it to avoid the problems we are talking about? Of course, what I really meant, what is it about real I/O, as opposed to memory mapped I/O that solves the problems? Oops. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
bruce@tolerant.UUCP (Bruce Hochuli) (09/15/89)
In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: :In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes: :>>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: :>>*really* discussing is whether or :>>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it :>>should be avoided : :>OK, I'll bite. What are the characteristics of memory mapped I/O that enable :>it to avoid the problems we are talking about? : :i'll summarize (and i'm sure y'all will correct me if i'm wrong:-) :memory mapped i/o has to be "memory-like" since processors will often assume :that stuff in the "memory space" is memory-like, even to the point of calling :its "memory space" virtual and translating to physical. : :memory-like devices have the property that if you inquire their value multiple :times all you get is their value (multiple times). restart is not a problem. :fifos are not memory-like; reading them causes all kinds of side effects :in them including that their value gets forgotten you forgot it (specifically :they are channel-like (in CSP vernacular)) : Stuff deleted! Seems to me that this conversation just took a weird turn. The issue here is not memory mapped vs. I/O mapped, but just what devices are we dealing with. One example, if I have an Intel 8254 out there; I have to make my accesses in a very particular order. I issue a command (access 1) and I read/write my data (access 2). If I reissue access 1, I will have a very confused counter/timer out there. Note that this has nothing whatever to do with memory or I/O mapping. The same example holds true for lots of devices that a designer might hang on a bus. Back to the larger issue, I still don't understand how re-issuing an instruction could avoid having to face this issue.
hascall@atanasoff.cs.iastate.edu (John Hascall) (09/15/89)
In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: }In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes: }>>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes: }>>*really* discussing is whether or }>>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it }>>should be avoided }memory-like devices have the property that if you inquire their value multiple }times all you get is their value (multiple times). restart is not a problem. }fifos are not memory-like; reading them causes all kinds of side effects }in them including that their value gets forgotten you forgot it (specifically }they are channel-like (in CSP vernacular)) I guess I fail to see the problem. I agree that for many I/O devices re-reading a device-register is a bad thing. What I don't see is how this can happen except when: a) you have an instruction (or instr. set) which is restarted *and* b) you have an instruction that reads from two (or more) operands (and the I/O location is not the last one read?). Are there machines which can get a page-fault acessing a memory-mapped I/O device register location?? (surely not!) Examples using the VAX instruction set (write operands are rightmost): MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space ; (even if MOVW was a restarted instr) SUBW3 IO_DEV_CSR,(R2)+,R3 ; no problem: SUBW3 is continued SUBW3 IO_DEV_CSR,@(R2)+,R3 ; trouble: @(R2)+ can cause page fault SUBW3 @(R2)+,IO_DEV_CSR,R3 ; can you get away with this because the ; I/O operand is read last?? I just can't see where you would use such wierd instructions in a device driver when accessing memory-mapped I/O registers (even in a multiple-memory-accesses-per-instruction machine like the VAX). John Hascall Systems Group ISU Comp Center
melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (09/15/89)
In article <1516@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes: > I guess I fail to see the problem. I agree that for many I/O devices > re-reading a device-register is a bad thing. What I don't see is how > this can happen except when: > > a) you have an instruction (or instr. set) which is restarted > *and* > b) you have an instruction that reads from two (or more) > operands (and the I/O location is not the last one read?). > > Are there machines which can get a page-fault acessing a memory-mapped > I/O device register location?? (surely not!) > > Examples using the VAX instruction set (write operands are rightmost): > > MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space > ; (even if MOVW was a restarted instr) The reason this works and seems not to be a problem is that the hardware designers have gone to some trouble to make it work. Consider what happens at the microarchitecure level and I think you'll agree that it really is a problem. Let's stick with this instruction and talk about the VAX 8600 implementation. When the instruction unit sees the opcode for the MOVW instruction, the execution unit could still be two instructions behind. What happens is the following: the instruction unit decodes the first operand and generates a virtual address memory read request to the memory unit. The memory unit then translates this virtual address (assuming it's not busy with another request) into a physical address using the translation buffer. Assuming a TB hit, then at this point the memory unit recognizes that it is an I/O address (the I/O space is reconizable from the physical address, in this case if bit 29 (the MSB) is high). Since the execution of all previous instructions has not yet completed at this point, the memory unit disregards the request and waits for it to be re-issued when the exeuction unit catches up. No further pre-fetching can occur and the pipeline is drained. The point is that if a previous instruction faults, let's say a page fault on a destination write, which will not be detected until the very end of the instruction, the read for the MOVW must not have taken place. If the I/O instruction had been recognizable from the opcode (as in my opinion it should be), the microarchitects could have designed a simpler memory unit that assumed any prefetch read from a non-I/O instruction is OK. Also consider that this is a simple example, in a more heavily pipelined machine, with perhaps even out-of-order prefetching of operands, it gets even harder to guarantee that these reads don't occur, it basically means that address translation for all reads must occur in order with a microtrap mechanism to back out when an I/O address is encountered. Since the person writing the device driver or other code that touches I/O registers generally knows which variables map to I/O space, why not just have them use a different instruction? Then, the microarchitecture can much more cleanly enter and exit this synchronization point. Steve Melvin University of California, Berkeley melvin@arpa.Berkeley.EDU ...!ucbvax!melvin
stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (09/15/89)
In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: >i'll summarize (and i'm sure y'all will correct me if i'm wrong:-) >memory mapped i/o has to be "memory-like" since processors will often assume >that stuff in the "memory space" is memory-like, even to the point of calling >its "memory space" virtual and translating to physical. > The definition I've always heard/used for memory-mapped I/O was one which implied that all control to the device was done via memory addresses, i.e. not using any special I/O instructions such as in/out on the 80x6 line. Therefore, any I/O device that is hooked up to a micro such as a 68K would by definition have to be "memory-mapped" since the 68K doesn't have provisions for in/out instructions. The device will react to the decode of some specific address range presented by the processor. I've never heard of said device being "memory-like" as being part of the definition of this term. Steve Wilson
kquick@simpact.com (Kevin Quick, Simpact Assoc., Inc.) (09/16/89)
In article <1516@atanasoff.cs.iastate.edu>, hascall@atanasoff.cs.iastate.edu (John Hascall) writes: > Examples using the VAX instruction set (write operands are rightmost): > > MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space > ; (even if MOVW was a restarted instr) > SUBW3 IO_DEV_CSR,(R2)+,R3 ; no problem: SUBW3 is continued > SUBW3 IO_DEV_CSR,@(R2)+,R3 ; trouble: @(R2)+ can cause page fault > SUBW3 @(R2)+,IO_DEV_CSR,R3 ; can you get away with this because the > ; I/O operand is read last?? > > I just can't see where you would use such wierd instructions in a > device driver when accessing memory-mapped I/O registers (even in a > multiple-memory-accesses-per-instruction machine like the VAX). > > John Hascall > Systems Group > ISU Comp Center The above instruction examples do show the possible problems involved in restarting vs continuing instructions when accessing device registers, but the third and fourth instructions above are usually protected (under VAX/VMS) in another fashion, namely the Interrupt Priority Level (IPL). On the VAX machine, setting the IPL to a value between 0 and 31 in the processor register will block all interrupts occurring at the current IPL setting or lower until the IPL setting is lowered to the value at which the interrupt may be delivered. The importance of this is the VAX/VMS programming convention (read law, since if you break it, bad things are very likely to happen to you) is that you must be set to IPL 20 or higher to touch device registers (memory-mapped locations). In relating this to the third and fourth instructions presented by Mr. Hascall, it is observed that a VAX/VMS page fault will occur at IPL 2, and is therefore prevented when accessing device registers at IPL 20. If Mr. Hascall's example above did actually generate a page fault, the machine would generate an invalid exception and a bugcheck ---> crashdump analysis and reboot time. (VAX/VMS drivers typically operate with system non-paged memory at high IPLs). As a final note with regards to this discussion, it is the general practice of device drivers on VAX/VMS to access device registers at IPL 20 or above, but to perform most of their processing at a lower IPL (typically 8). Thus, a driver wishing to read a device register as above would raise the IPL to 20, read the device register into an unused register, and then return to IPL 8 to continue processing, i.e. the device register is touched ONCE for the entire process, and it is the driver's responsibility not to lose the value obtained, since it probably cannot be reread from the device. My apologies if this seems overly pedantic or is more machine specific than this discussion/forum warrants, but I wanted to shed some light on one this aspect of the restarted vs. continued instruction discussion. -- Kevin Quick, Simpact Associates, Inc., San Diego, CA. Internet: simpact!kquick@crash.cts.com
mash@mips.COM (John Mashey) (09/16/89)
In article <31316@ucbvax.BERKELEY.EDU> melvin@ucbarpa.Berkeley.EDU.UUCP (Steve Melvin) writes: .... >> Examples using the VAX instruction set (write operands are rightmost): >> >> MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space >> ; (even if MOVW was a restarted instr) ... >Also consider that this is a simple example, in a more heavily pipelined >machine, with perhaps even out-of-order prefetching of operands, it gets >even harder to guarantee that these reads don't occur, it basically means >that address translation for all reads must occur in order with a microtrap >mechanism to back out when an I/O address is encountered. Since the person >writing the device driver or other code that touches I/O registers generally >knows which variables map to I/O space, why not just have them use a >different instruction? Then, the microarchitecture can much more cleanly >enter and exit this synchronization point. Note that this whole issue is not (just) a hardware issue, it's a: hardware instruction-level hardware micro-architecture language definition compiler technology and operating system issue; and it's IMPORTANT to understand how these all fit together. For example: 1) Some people like to write their device drivers in a language higher than assembler. Hence they do not directly choose instructions, and if the code generator needs to do something different for memory-mapped I/O, it needs to know that. 2) Even on a simple load/store RISC machine, a global optimizer can surprise you by rearranging things; the continuation issue, and the dealing-with-optimizer issue may not look practically different from the system programmer's view, i.e., they could be surprised either way. 3) Most languages don't even have methods for telling an optimizer to be careful. C's volatile is a useful exception. 4) Systems and chips are different. One may well build a system by choosing/designing I/O controllers that have "good" properties. On the other hand, chips expected to be used in many different ways need to survive all kinds of odd behaviors. A classic reference here is by Tom Lyon & Joe Skudlarek of Sun: "All the Chips That Fit", UNIX Review 4, 2 (Feb 1989), 29-34. (Earlier version in Summer 1985 USENIX). This is subtitled: "Semiconductor manufacturers continue to heap feature upon feature, so mama, don't let your babies grow up to be system software engineers." ------- Attributes that make life simpler in machines that use memory-mapped I/O: 1) Load/store architecture, specifically, no more than one load or store per instruction, required to be on naturally-aligned boundaries, hence fast pipeline with no surprises. Include all of the 8-16-32 bit accesses as normal instructions, else some devices that must be dealt with can give surprises. For example, it is not good enough to do load-words, and then extract bytes, as you can cause problems with some device registers by issuing extra accesses. 2) If you use global-optimizing compilers, you need (in C) volatile, or some equivalent elsewhere. This has to work "right", where "right" turns out to be: after optimization, the exact same number of loads and stores to volatile variables must occur, in exactly the same order, as before such optimization. Anything less than that leads to crazed systems programmers. 3) Be careful of buffering. For example, some MIPS-based systems use a 4-deep write-buffer that provides read-around, i.e., reads have priority over writes, and hence, you can end up doing a write to a control register, and possibly then reading the associated status register while the write is still pending. (We use a kernel function wbflush() that waits until the write buffer is empty. This is OK and works; however some of the newer systems use write-flushing, i.e., a read stalls until all of the writes are done, and this is clearly easier to use, although there is little difference in performance (stalls are stalls, no matter what). In particular, it almost seems like uncached references in hardware are like volatile in software: a good default is to stall and make sure the state is clean. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
hascall@atanasoff.cs.iastate.edu (John Hascall) (09/16/89)
In article <?> kquick@simpact.com (Kevin Quick, Simpact Assoc., Inc.) writes: }In article <?>, hascall@atanasoff.cs.iastate.edu (John Hascall) writes: }> Examples using the VAX instruction set (write operands are rightmost): }> MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space }> ; (even if MOVW was a restarted instr) }> SUBW3 IO_DEV_CSR,(R2)+,R3 ; no problem: SUBW3 is continued }> SUBW3 IO_DEV_CSR,@(R2)+,R3 ; trouble: @(R2)+ can cause page fault }> SUBW3 @(R2)+,IO_DEV_CSR,R3 ; can you get away with this because the }> ; I/O operand is read last?? }The above instruction examples do show the possible problems involved in }restarting vs continuing instructions when accessing device registers, but }the third and fourth instructions above are usually protected (under VAX/VMS) }in another fashion, namely the Interrupt Priority Level (IPL). Mostly, raising IPL protects you from critical section problems (talking uni-processor here). Section 6.2 of "Writing a Device Driver for VAX/VMS" states (item 7): To access I/O space, use only the following instructions. These instructions cannot be interrupted unless they use autoincrement- defferred addressing mode or any of the displacement-deferred modes when specifying an operand. }In relating this to the third and fourth instructions presented by Mr. Hascall, }it is observed that a VAX/VMS page fault will occur at IPL 2, and is therefore }prevented when accessing device registers at IPL 20. Page faults are not allowed above IPL 2, not for really architectural reasons, but because of the critical section problem and since VMS does process scheduling at IPL 3 (there may be not be a process context in which to make the page valid). Anyway, the use of VAX instructions was purely for my own convenience. My point was, (as was restated by Mr. Quick), regardless of instructions available most/all device drivers only need access (memory mapped) device registers in a simple fashion. Thus, I don't think memory mapped I/O is somehow less flexible than having specific I/O instructions as was suggested earlier even if certain instructions and/or addressing modes have to be prohibited. John Hascall
melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (09/18/89)
In article <27633@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >Note that this whole issue is not (just) a hardware issue, it's a: > hardware instruction-level > hardware micro-architecture > language definition > compiler technology >and operating system >issue; and it's IMPORTANT to understand how these all fit together. >... >Attributes that make life simpler in machines that use memory-mapped I/O: >1) Load/store architecture, >... Your point is well taken, there are many sides to this issue, but I don't think it's fair to say that load/store *architectures* make life simpler for systems programmers; using simple loads and stores is pretty much of a requirement as has been pointed out, regardless of whether memory to memory instructions exist. But which instructions are used and what restrictions need to be placed on them is secondary to the real issue here. The bottom line for an I/O instruction is that it represents a synchronization point from the perspective of the hardware. That is, all unconfirmed operations have to be verified before the I/O operation can take place. All predicted branches have to be confirmed, all pending memory reads and writes have to at least be translated to verify that they can be completed and all operations that can generate exceptions have to be executed. Generally this means that the entire pipeline has to be drained. This is a simple fact, there is no way around it (at least not as long as reads have side-effects and there is no "undo" function.) However, if the mere fact that you have to handle these synchronization points correctly (which are few and far between) slows down the other 99.9% of your code, something is wrong. In low concurrency machines, memory mapped I/O isn't a big deal in this regard because it doesn't slow down non-I/O code. Just go ahead and use ordinary instructions with ordinary virtual addresses (with appropriate restrictions on number and type of operands, as has been discussed) and let the hardware figure out that it has an I/O instruction when it sees the address. However, in high concurrency microarchitectures which execute multiple operations per cycle and in an order determined at run-time (these processors are coming, BTW) there has to be a more explicit way to let the hardware know about an I/O instruction. Simply using ordinary instructions with ordinary addresses, and expecting the hardware to do the right thing won't work. You can't expect to get maximal speedup on the code that doesn't know or care about I/O if the hardware has to guarantee that it doesn't trip across a synchronization point in the middle of some basic block that it is executing out-of-order. There are many possible ways to do this and memory mapped I/O could still be incorporated into such a solution. My only point is that presenting a memory model to the hardware in which reads don't have side-effects is of critical importance in high concurrency designs. (Of lesser importance but also of value is the property of multiple writes (i.e. a write of an incorrect value can take place and the correct value can be later written.)) I think that the increase in performance will win out whatever reduction in convenience this implies and people will figure out whatever has to be figured out at the higher levels in order to allow the hardware to make these assumptions unless explicitly told otherwise (i.e. BEFORE address translation: part of the opcode, surrounded by special instructions, etc.). ---- Steve Melvin University of California, Berkeley ----
ok@cs.mu.oz.au (Richard O'Keefe) (09/18/89)
There is another problem with memory-mapped I/O instructions; I guess this site must be losing some messages because I don't recall seeing it mentioned. If I do movb InputPort, r0 where InputPort is a memory-mapped device port, and then a few instructions later to movb InputPort, r0 again, I really don't want the second reference to look in the cache and return whatever the first reference returned. Presumably the VAX handles this by ensuring that addresses in the "device" range are never cached. A simpler approach could be to have I/O instructions. Note that "memory mapped I/O" has two faces: -- device registers appear as memory locations TO THE CPU -- device registers appear as memory locations TO THE BUS It would be possible to have a machine with special input DeviceAddress, Register output DeviceAddress, Register instructions which the CPU, cache, and so on "knew" about, but which appeared ON THE BUS just like memory references that miss the cache. Whether that would be a good thing is another question again.
johnl@esegue.segue.boston.ma.us (John R. Levine) (09/20/89)
In article <2128@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: >... Note that "memory mapped I/O" has two faces: > -- device registers appear as memory locations TO THE CPU > -- device registers appear as memory locations TO THE BUS >It would be possible to have a machine with special > input DeviceAddress, Register > output DeviceAddress, Register >instructions which the CPU, cache, and so on "knew" about, but which >appeared ON THE BUS just like memory references that miss the cache. >Whether that would be a good thing is another question again. It's a fine idea. On the 8086 and its descendants, hence on the IBM PC, excuse me, Industry Standard Architecture, bus, I/O cycles and memory cycles are the same except for a line that says whether it's an I/O or a memory address. I/O addresses are never cached, of course. The PDP-11 was the first machine to use memory-mapped I/O (the first one I know about, anyway.) By convention, all device registers were mapped in the highest 8K bytes of the address space, and caches knew not to cache addresses in that range. On the Q-Bus, the second version of the PDP-11 bus, there is even a line that says that the current address is in the top 8K. They intended it to make it easier to decode device addresses, but it is equally useful to distinguish between I/O and memory. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869 johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu Massachusetts has 64 licensed drivers who are over 100 years old. -The Globe
peter@ficc.uu.net (Peter da Silva) (09/25/89)
In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes: > Back to the larger issue, I still don't understand how re-issuing > an instruction could avoid having to face [problems with memory mapped > I/O]. What is the characteristic that makes "I/O mapped" I/O (that is, I/O that uses special instructions to access the I/O address space) safe? The characteristic is that the instruction can not fault in such a way that an I/O operation is performed twice. That is, "I/O mapped" instructions are inherently load/store. What is the characteristic that makes memory mapped I/O dangerous? That if a memory-memory instruction with an I/O device at one end faults, the I/O operation can be duplicated. The simple solution is... don't perform memory-memory instructions on I/O. Fairly easy in CISC, and simple in most RISC architectures... they don't have memory-memory operations. -- Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation. Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-' "That is not the Usenet tradition, but it's a solidly-entrenched U delusion now." -- brian@ucsd.Edu (Brian Kantor)
peter@ficc.uu.net (Peter da Silva) (09/25/89)
Taking the other side now... In article <1516@atanasoff.cs.iastate.edu>, hascall@atanasoff.cs.iastate.edu (John Hascall) writes: > SUBW3 IO_DEV_CSR,@(R2)+,R3 ; trouble: @(R2)+ can cause page fault > I just can't see where you would use such wierd instructions in a > device driver when accessing memory-mapped I/O registers (even in a > multiple-memory-accesses-per-instruction machine like the VAX). If the device-driver is written in a high-level language, perhaps? -- Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation. Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-' "That is not the Usenet tradition, but it's a solidly-entrenched U delusion now." -- brian@ucsd.Edu (Brian Kantor)
stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (09/28/89)
In article <6283@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: >In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes: >> Back to the larger issue, I still don't understand how re-issuing >> an instruction could avoid having to face [problems with memory mapped >> I/O]. > >What is the characteristic that makes "I/O mapped" I/O (that is, I/O that >uses special instructions to access the I/O address space) safe? The >characteristic is that the instruction can not fault in such a way that >an I/O operation is performed twice. That is, "I/O mapped" instructions >are inherently load/store. > >What is the characteristic that makes memory mapped I/O dangerous? That >if a memory-memory instruction with an I/O device at one end faults, the >I/O operation can be duplicated. The simple solution is... don't perform >memory-memory instructions on I/O. Fairly easy in CISC, and simple in most >RISC architectures... they don't have memory-memory operations. >-- Peter, Can't agree with your statement about it being fairly easy to avoid doing memory operations on memory-mapped I/O using CISCS. There are several CISCs(the 68k and 32K come immediately to mind) that don't have I/O instructions of any sort, therefore you have to map the I/O registers into the machine's memory space. Assume you've got a FIFO type device such as a USART. Any read of the data register will be destructive, i.e. the memory location value will change as a function of the arriving characters in the USART's FIFO. The only way that a 68k or 32k can talk to this device is via the micro's memory space. If the micro is designed to "instruction restart" you have to guarantee that the select NEVER got out to the USART. This tends to get in the way of building zero wait-state hardware ;-) I think this is the type of problem Bruce is talking about(Hi Bruce). Steve Wilson Consultant at large Currently serving time at Wyse Technology Standard Disclaimer - These are my opinions, not those of my employer's.
ingoldsb@ctycal.COM (Terry Ingoldsby) (09/29/89)
In article <2451@wyse.wyse.com>, stevew@wyse.wyse.com (Steve Wilson xttemp dept303) writes: > In article <6283@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes: > >In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes: > >> Back to the larger issue, I still don't understand how re-issuing > >> an instruction could avoid having to face [problems with memory mapped > >> I/O]. > > > >What is the characteristic that makes "I/O mapped" I/O (that is, I/O that > >uses special instructions to access the I/O address space) safe? The > >characteristic is that the instruction can not fault in such a way that > >an I/O operation is performed twice. That is, "I/O mapped" instructions > >are inherently load/store. > > > >What is the characteristic that makes memory mapped I/O dangerous? That > >if a memory-memory instruction with an I/O device at one end faults, the > > Can't agree with your statement about it being fairly easy to avoid doing > memory operations on memory-mapped I/O using CISCS. There are several I think what is being suggested is a method of preserving the philosophy of I/O mapped instructions using a processor that only has memory mapped I/O. Generally, what you are trying to avoid is a page fault (or some other sort of interrupt/trap/bus error that would happen part way through an instruction, thus causing it to wait for the fault to clear, and then re-execute the entire instruction. Suppose, for example, you wanted to read a data byte from the data register (which is mapped to address addr1) of a serial device. If you performed a CISCy, memory based instruction like: MOVM addr1,addr2 (ie Move data found at location addr1 to addr2) then if addr1 and addr2 are not in the same page, a page fault may occur. Since the processor might not know this until it had accessed addr1 (and was trying to access addr2) then execution would pause until the fault had been corrected (ie. the page brought into memory) and then the whole instruction would repeat. The next value would be read from the peripheral data register (losing the previous value). As an alternative, do the following: MOV addr1,R1 (ie Move data found at addr1 into register R1) STO R1,addr2 (store register contents in addr2) Even if the fault occurs during the read of addr1 (a bit odd since peripheral memory locations are usually non-pageable) then the access will still only occur once. If the page fault occurs during the store then we similarly don't care since addr2 will also only be accessed once (in my example its just memory but could conceivably be another peripheral). Sorry to ramble on (I usually prefer to listen), but I thought there was some abiguity in the discussion. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
ok@cs.mu.oz.au (Richard O'Keefe) (09/30/89)
In article <477@ctycal.UUCP>, ingoldsb@ctycal.COM (Terry Ingoldsby) writes: > As an alternative, do the following: > MOV addr1,R1 (ie Move data found at addr1 into register R1) > STO R1,addr2 (store register contents in addr2) > Even if the fault occurs during the read of addr1 (a bit odd since > peripheral memory locations are usually non-pageable) then the access will > still only occur once. If the page fault occurs during the store then > we similarly don't care since addr2 will also only be accessed once (in my > example it's just memory but could conceivably be another peripheral). It doesn't sound all that unreasonable for a page fault to occur during a memory-mapped I/O operation. Imagine a memory-mapped scheme where each device has all its registers in a different page of I/O space, and where the operating system is running a "virtual machine" scheme. All I/O pages would initially be mapped out of a process's address space. Touching an I/O page would cause a page fault, at which time the O/S would check whether the process had permission to access that device, and if so would map the page in. If the O/S needed to seize control of the device back, it would map the page out again. Whether such a scheme is useful or not is another matter. -- GNUs are more derived than other extant alcelaphines,| Richard A. O'Keefe such as bonteboks, and show up later in the fossil | visiting Melbourne record than less highly derived species. (Eldredge) | ok@munmurra.cs.mu.OZ.au
ingoldsb@ctycal.COM (Terry Ingoldsby) (10/02/89)
In article <2255@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes: > In article <477@ctycal.UUCP>, ingoldsb@ctycal.COM (Terry Ingoldsby) writes: > > As an alternative, do the following: > > MOV addr1,R1 (ie Move data found at addr1 into register R1) > > STO R1,addr2 (store register contents in addr2) > > Even if the fault occurs during the read of addr1 (a bit odd since > > peripheral memory locations are usually non-pageable) then the access will > > It doesn't sound all that unreasonable for a page fault to occur during > a memory-mapped I/O operation. Imagine a memory-mapped scheme where each > device has all its registers in a different page of I/O space, and where > the operating system is running a "virtual machine" scheme. All I/O > pages would initially be mapped out of a process's address space. Touching > an I/O page would cause a page fault, at which time the O/S would check > whether the process had permission to access that device, and if so would > map the page in. If the O/S needed to seize control of the device back, > it would map the page out again. > Granted, it could happen under some conditions. I was thinking more of the case where the user doesn't actually do the memory access, but rather makes a system call for the OS to do it. Many OSes lock things like disk device drivers, buffer space, tables, etc. into memory and I assumed that many OSes would do the same with the memory locations corresponding to devices. This wouldn't be hard to do since a small region of memory could be dedicated to this. Whether this is done or not I have no idea (any OS gurus out there know how it really is done?). Nonetheless, it doesn't make any difference since the instructions I suggested still work even if the fault occurs. The only thing that I can see happening if the page is not locked into memory is if a swapper tries to write a region of memory out to disk. That could provoke some unusual behaviour of the peripherals! A similar argument would exist for swapping (or paging - if it is a hard page fault) the region back into memory. Soft page faults (ie. those where the page is in physical memory, but not in a process's page table) just involve setting up some pointers and don't actually affect (ie. read/write) the memory locations. Is any of this making sense? -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (10/02/89)
In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes: > >As an alternative, do the following: > MOV addr1,R1 (ie Move data found at addr1 into register R1) > STO R1,addr2 (store register contents in addr2) >Even if the fault occurs during the read of addr1 (a bit odd since >peripheral memory locations are usually non-pageable) then the access will >still only occur once. If the page fault occurs during the store then >we similarly don't care since addr2 will also only be accessed once (in my >example its just memory but could conceivably be another peripheral). > That the peripheral memory location could page fault is only part of the problem (and as you rightly point out perhaps a not very realistic one at that). Because (out of order) pre-fetching can be done you can have: MOV any_old_addr1,any_old_addr2 MOV peripheral_addr,R1 Because (out of order) pre-fetching can be done you can have peripheral_ addr1 being accessed before one of the any_old_addresses. Now if either of the any_old_addresses are not accessible and (out of order) prefetching is allowed you can have a problem even if peripheral_addr _is_ accessible. And how about dual pre-fetching for both taken and not taken path streams? (Again a 68040 special). If the memory-mapped io instruction is the first instruction in each of the branches then in my book you've got a problem. I'ld like to bring back in memory a quote from an execellent posting from Steve Melvin from the University of California Berkeley 2 weeks ago on this subject: "...there are many sides to this issue, but I don't think it's fair to say that load/store *architectures* make life simpler for systems programmers; using simple loads and stores is pretty much of a requirement as has been pointed out, regardless of whether memory to memory instructions exist. But which instructions are used and what restrictions need to be placed on them is secondary to the real issue here. The bottom line for an I/O instruction is that it represents a synchronization point from the perspective of the hardware. That is, all unconfirmed operations have to be verified before the I/O operation can take place. All predicted branches have to be confirmed, all pending memory reads and writes have to at least be translated to verify that they can be completed and all operations that can generate exceptions have to be executed. Generally this means that the entire pipeline has to be drained. This is a simple fact, there is no way around it (at least not as long as reads have side-effects and there is no "undo" function.)" -- "Geld groeit me niet op de rug." Literally: "Money doesn't grow on my back." (Often overheard at the supermarket counter from mothers to their kids.) Roelof Vuurboom SSP/V3 Philips TDS Apeldoorn, The Netherlands +31 55 432226 domain: roelof@idca.tds.philips.nl uucp: ...!mcvax!philapd!roelof
baum@Apple.COM (Allen J. Baum) (10/03/89)
[] >In article <2255@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes: > Imagine a memory-mapped scheme where each >device has all its registers in a different page of I/O space, and where >the operating system is running a "virtual machine" scheme. All I/O >pages would initially be mapped out of a process's address space. Touching >an I/O page would cause a page fault, at which time the O/S would check >whether the process had permission to access that device, and if so would >map the page in. If the O/S needed to seize control of the device back, >it would map the page out again. > >Whether such a scheme is useful or not is another matter. This is very similar to the HP "Spectrum" ..er.. Precision's IO scheme. All I/O devices are mapped onto two pages. Data registers are mapped to one, and control registers to the other (generally). The idea is that you can keep control of a device, and still let programs have access to the data. Really, there is no reason not to let a user have direct access to his/her own serial port; it can't affect security. You may not want to give them access to the control registers, especially if they affect more than one line. Direct access for the common stuff means a LOT lower overhead. You can get a keystroke with a "Load" instruction in one cycle, instead of a system call that is likely to cost you a millisecond. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
andrew@frip.WV.TEK.COM (Andrew Klossner) (10/03/89)
[] "The bottom line for an I/O instruction is that it represents a synchronization point from the perspective of the hardware. That is, all unconfirmed operations have to be verified before the I/O operation can take place." This is a sufficient but not a necessary condition. It's more restrictive than it needs to be. By way of counter-example: the M88k has a lot of pipelining, and some of the floating-point exceptions are imprecise. I might very well have code like this: fmul.sss r2,r3,r4 ; start a floating multiply ld r5,r6,0 ; start an I/O read and, while the load is underway, the FPU decides to fire off a floating underflow exception. That's fine, I expected this and my floating exception handler substitutes zero, cleans out the pipes, and returns. I don't need to idle for four cycles waiting for the multiply to finish just on the rare chance that it will underflow. If for some reason I want synchronization, I'll use the provided instructions to suspend execution until all the pipelines have drained. But I don't need to do this on every I/O op. -=- Andrew Klossner (uunet!tektronix!frip.WV.TEK!andrew) [UUCP] (andrew%frip.wv.tek.com@relay.cs.net) [ARPA]
vorbrueg@bufo.usc.edu (Jan Vorbrueggen) (10/03/89)
In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes: > >As an alternative, do the following: > MOV addr1,R1 (ie Move data found at addr1 into register R1) > STO R1,addr2 (store register contents in addr2) >Even if the fault occurs during the read of addr1 (a bit odd since >peripheral memory locations are usually non-pageable) then the access will >still only occur once. If the page fault occurs during the store then >we similarly don't care since addr2 will also only be accessed once (in my >example its just memory but could conceivably be another peripheral). But what if the pagetables mapping addr1 or addr2 are pageable? This is the case on VAXen, where a process' pagetables (for its private virtual address space called P0 or P1) reside in the virtual address space shared by all processes (called S0). The pagetables for S0, of course, are stored in contiguous physical memory, at a physical address recorded in a processor register. So, a pagefault can occur for two reasons: the page referenced is invalid (not in the working set), or the pagetable page is invalid. Naturally, the pagetable page for a valid (in-working set) page has to be valid. However, VMS since (at least) V2.3 and probably until today has a bug where pages mapped to physical memory (e.g., I/O registers), though by definition always valid, don't increment the reference count of the pagetable page. In a memory-tight situation (in this case, the process was only allowed 100 pages in its working set), VMS will kindly recuperate unused memory by performing a so-called dead pagetable scan. It proceeds to remove the pagetable mapping the device, and next time the process references it, bang: down goes the system! (Pagefaults are not allowed in certain situations.) Another nice situation happens when you debug an application using a memory-mapped device. Say your application sadly has a bug and accesses an address where there is no device. It will get an access violation fault. So you fire up the debugger and single-step through it. This is implemented by using special trace bits which will return control to the debugger after every instruction. The program writes to that non-existant location and gets its trace-pending trap, calling a priviledged-mode routine to handle the trap. Some instructions (or 10 microseconds) later, the bus tells the cpu that it couldn't perform the write. Now the access violation seems to have occured in the priviledged mode - and a crash occurs... Jan Vorbrueggen
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/03/89)
In article <265@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes: >I'ld like to bring back in memory a quote from an execellent posting from >Steve Melvin from the University of California Berkeley 2 weeks ago on this >subject: >The bottom >line for an I/O instruction is that it represents a synchronization point from >the perspective of the hardware. That is, all unconfirmed operations have to >be verified before the I/O operation can take place. All predicted branches >have to be confirmed, all pending memory reads and writes have to at least be >translated to verify that they can be completed and all operations that can >generate exceptions have to be executed. Generally this means that the entire >pipeline has to be drained. I'm afraid I don't agree. Since direct control over a physical IO device implies some level of privilege, it is reasonable to require each handler to memory-lock its pages beforehand. It is also reasonable to insist that the handler not divide by zero (etc) within the immediate vicinity of the IO action. Handlers don't need a "drain pipeline" instruction, if they are not going to fault. As proof, I offer the working systems that are out in the world. You seem to be asking for processors that are more complicated, and complication is not free. You cannot justify your case by showing how it will make handlers possible: they already are. So, you will have to show how it makes handlers better, or cheaper. Old truism, disguised as an old joke: "Doctor, it hurts when I do THIS." "Then don't do that." -- Don D.C.Lindsay Carnegie Mellon Computer Science
melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (10/03/89)
In article <4796@orca.WV.TEK.COM> andrew@frip.wv.tek.com writes: >By way of counter-example: the M88k has a lot of pipelining, and some >of the floating-point exceptions are imprecise. I might very well have >code like this: > > fmul.sss r2,r3,r4 ; start a floating multiply > ld r5,r6,0 ; start an I/O read > I'm a little surprised that the 88K can actually do the I/O read before it knows if the multiply will generate a fault. Can you absolutely confirm this? Either way though, you have brought up a valid point, and that is that some memory mapped I/O registers may not need to be synchronized on. Certainly, a reasonable alternative to having the hardware assume all I/O reads have side-effects would be to let the programmer specify explicitly when they want synchronization and not provide it otherwise. If they expect only one read of the I/O register, the machine will still have to synchronize, but perhaps this is not always required. But this was my point. My original posting was suggesting that it is better to have the programmer explicitly let the hardware know when a read has a side-effect rather than to have the hardware discover this fact after address translation. The thing to keep in mind, however, is that in some situations it may be difficult for the programmer to know if it's "safe" not to synchronize unless the read truly has no side-effects. The processor may not yet have confirmed a branch that has been predicted many instructions back. In article <6384@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes: >Since direct control over a physical IO device implies some level of >privilege, it is reasonable to require each handler to memory-lock >its pages beforehand. It is also reasonable to insist that the >handler not divide by zero (etc) within the immediate vicinity of the >IO action. Handlers don't need a "drain pipeline" instruction, if >they are not going to fault. > This is certainly a valid approach. That is, why not put the responsibility on the programmer to guarantee that no memory or arithmetic exceptions will occur in the vicinity of the I/O instruction. Then, the processor would not have to confirm outstanding memory and arithmetic operations. But faults are not the only issue. Even in the situation you propose, the hardware would still have to confirm outstanding branch predictions and it is also the case that there would have to be some sequencing control, implicit or explicit, in order to force multiple I/O reads to occur in program order. The question of exactly what is the *vicinity* gets a bit tricky also. In some situations, the vicinity might be quite large, and within the *dynamic* instruction stream, which may be difficult to know exhaustively. >As proof, I offer the working systems that are out in the world. > OK, I don't claim to know much about the real world. But I would be interested to know what systems out there work as you suggest. That is, processors that do not guarantee that a memory mapped I/O read will not take place multiple times if it is in the vicinity of an instruction that could fault. This is certainly not the case for any VAX. ------- Steve Melvin University of California, Berkeley -------
marc@oahu.cs.ucla.edu (Marc Tremblay) (10/03/89)
In article <4796@orca.WV.TEK.COM> andrew@frip.wv.tek.com writes: >By way of counter-example: the M88k has a lot of pipelining, and some >of the floating-point exceptions are imprecise. I might very well have >code like this: > > fmul.sss r2,r3,r4 ; start a floating multiply > ld r5,r6,0 ; start an I/O read > >and, while the load is underway, the FPU decides to fire off a floating >underflow exception. That's fine, I expected this and my floating >exception handler substitutes zero, cleans out the pipes, and returns. >I don't need to idle for four cycles waiting for the multiply to finish >just on the rare chance that it will underflow. Notice that these two instructions do not have any control or data dependencies. The load could have been executed before the floating point multiply and the result would have been the same. So in this example it really doesn't matter (for the exception routine) if ld is executed before fmul.sss! Problems would occur if instructions occurring after the "faulty" instruction modify registers needed in the exception routine. Fortunately the 88000 provides special registers which contain the necessary information for the software to complete an instruction that caused an imprecise exception. Marc Tremblay marc@CS.UCLA.EDU
ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/05/89)
In article <20260@usc.edu>, vorbrueg@bufo.usc.edu (Jan Vorbrueggen) writes: > In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes: > > > >As an alternative, do the following: > > MOV addr1,R1 (ie Move data found at addr1 into register R1) > > STO R1,addr2 (store register contents in addr2) > But what if the pagetables mapping addr1 or addr2 are pageable? Sorry, I'm not quite following how this anything. ie. why will that make addr1 or addr2 get accessed more than once? -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/10/89)
In article <487@ctycal.UUCP> ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes: >In article <20260@usc.edu>, vorbrueg@bufo.usc.edu (Jan Vorbrueggen) writes: >> In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes: >> > >> >As an alternative, do the following: >> > MOV addr1,R1 (ie Move data found at addr1 into register R1) >> > STO R1,addr2 (store register contents in addr2) >> But what if the pagetables mapping addr1 or addr2 are pageable? > >Sorry, I'm not quite following how this anything. ie. why will that >make addr1 or addr2 get accessed more than once? The worst that can happen is that processing the STO will cause a fault, after the MOV has already started its memory read, and before the MOV has completed. There are four ways out of that situation: 1) Abort the MOV and restart it later. This is the dreaded Bad Thing. 2) The hardware can finish the outstanding instruction before honoring the fault. 3) The data read from memory can be latched, and the fault handler can rationalize things. 4) My personal favorite: "Then don't do that." Either faults are OK, or they aren't. If they aren't, then the code segment should be memory locked. Locking the page table entries should be an automatic consequence of getting the OS to lock the pages. -- Don D.C.Lindsay Carnegie Mellon Computer Science