[net.micro.68k] More virtual machine discussions on M680x0 processors

davet@oakhill.UUCP (Dave Trissel) (02/08/85)

Xref: seismo net.unix-wizards:11815

>> Back to the hypervisor simulating the virtual RTE.  Assuming a bus retry was
>> indicated then the hypervisor simply does its own RTE back to the emulated
>> environment.  The bus cycle will complete normally (remember the interception
>> of the mapping fixup in the virtual OS) and the emulation will continue
>> normally.
>>
>> Now, wasn't that easy? :-)

>No, it's not that easy.  There's no guarantee that the virtual RTEs occur
>immediately after the virtual bus error;  the real operating system must
>be able to pair every virtual RTE to a preceding virtual/real bus error.
>Any number of tasks in the virtual operating system may be waiting to have
>their pages brought in so that the virtual RTEs may occur, so the real
>operating system has to keep the real bus error information around for a long
>time (forever).  However, there is no guarantee the the virtual RTE will
>ever occur (virtual segmentation fault), so either the real operating system
>will slowly accumulate unused bus error information or else it must have some
>disgusting heuristic for knowing when to get rid of it.

As you point out, there are several methods for handling the long frame stack
information for RTEs.

The first is to keep the entire frames around for the duration of the virtual
machine and insure that any long RTEs issued by the virtual machine match
one for validity (after which it can be tossed.)  This indeed means the
ever expanding collection of unreturned bus error frames will occur.  How many
depend on the exact nature of the virtual system being implemented.  (If that
system is a non-virtual memory system then no frames need be saved since none
will be returned.)

There are several ways to minimize the overhead of saving frame states.
The first is in recognizing that a virtual fault represents an immediate
request to satisfy a reference and in most cases that request will be
satisfied within a few seconds.  Knowing this, only the last 5 or so frames
need be saved in main memory.  When the table fills stage the oldest onto
a disk file.  Notice that this disk file will seldom have
to be searched because nearly always a match will be found in the resident
list.  In essence the disk file becomes a throw-away mechanism.
(Even easier yet, if the hypervisor itself runs virtually it can just
keep all entries in a large resident list and always search in "most
recently entered" order.)  Another way to minimize space is to checksum the
frame.  This would reduce each long stack frame to a simple
32-bit or 64-bit key.  Staging keys to disk (or using a virtual array) would
work for these cases as well.

Alternatives to remembering things are mentioned next...

>The way around the problem is to push the actual information onto the
>virtual operating system's stack and let it do whatever with it.
>Detection of invalidly modified info on RTEs can be done with a "signature"
>such as checksumming or encryption.  However, checksumming is easily fooled
>by a malicious virtual operating system and encryption is expensive.

Checksumming or signatures as a way to avoid holding state information won't
work if you are going to support hypervisors running other hypervisors.
The checksum itself cannot be stored within the longframe since the virtual
machine hypervisor will also try to do the same overwritting the value.

If one is not worried about supporting other hypervisors then this scheme
will work assuming that a suitable position in the stack frame can be found
to store the key.  The remark about malicious operating systems is commented
on later.  The encryption or checksumming can be done quite effeciently with
small instruction loops.  Considering the significant amount of overhead
required for each and every privileged simulation by the hypervisor (all TRAPs
and Virtual I/O must be verified completely and then simulated) the occasional
bus exception is practically unnoticed.  When I was a systems programmer
at a site where we ran VM/370, our measurement hardware showed that 30
percent of our mainframe processing time was spent just in the overhead
of providing for virtual machine services.

There is yet another way for the hypervisor to know when to throw away saved
checking frames, and that is for the virtual operating system itself to
indicate that a frame just recieved will not be returned.  This method may
have been alluded to in the following remark...

> ...or else it must have some disgusting heuristic for knowing when to get
> rid of it.

In any case, VM/370 uses the DIAGNOSE machine instruction to communicate both
with subordinates beneath it and hypervisors above it (including hypervisor
versions of itself.)  It turns out that running a demand paged operating
system underneath yet another layer of virtual paging is, as can be expected,
just not too swift.  Because of this, paging status and other information is
passed to VM/370 byvarious versions of IBM's operating systems which
support demand paging.

>I think Motorola blew it on this one; I don't think we'll see a virtual
>machine environment as useful and reliable as VM/370 on the MC680x0.
>        Tom Lyon
>        Sun Microsystems, Inc.

Well, we blew it in the sense that the situation has to be handled at all.
But I don't think that this will have any bearing whatsoever on virtual
machine environments for the M68000 family.  And having been involved with
VM/370 (admittedly from several years ago) I would not classify it as
significantly useful or reliable. I was shocked when I found out that local
users here at Motorola still have to do primitive psuedo card reading
and printing just to swap files with other users.  I thought such things
would have been improved years ago. (Its rather interesting to hear them
curse while attempting to edit files on our IBM systems, especially after
they've just been on  a Macintosh.)

There are two primary reasons for virtual machines.  The first is to
assist in the debugging of an operating system itself or portions of
an operating system where hardware is not yet available.  Of course such
an operating system may clobber the stack it is attempting an RTE on but
this is just one of hundreds of thousands of things an operating system
can do wrong.  Minimal checking of the exception frame will catch most of
these.

The second reason for virtual machines is to allow the execution of
diverse standard operating systems on one piece of hardware.  In this
regard it certainly can be expected that 1) the operating system has
been suitably debugged in the first place and 2) such an operating system
will not try to "malicious"ly fool a hypervisor which it knows nothing
about to begin with.

Therefore, the only reason for concern about the stack RTE validation is
where there is an environment (such as at a university) where there may
be an attempt by someone to intentionally foil the checksumming (or whatever)
method to bomb the system.  The answer here is simple.  Use the saved
stack and compare method suggested first.  This will guarantee system
integrity.

As an added note, I think its interesting to mention that we at Motorola
are unaware of any customers interested in virtual machine capability.
If this is the case and continues to be the case, then this discussion
is mostly academic.  If you are out there lets hear from you.

Any comments from the community are welcome!

Motorola Semiconductor Inc.             Dave Trissel
Austin, Texas      {ihnp4,seismo,ctvax,gatech}!ut-sally!oakhill!davet