[comp.arch] HP-PA and CISC emulation

rang@cs.wisc.edu (Anton Rang) (05/07/91)

In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
>(Does HP-PA do this right now? If so, I am very impressed. I would be
>much more impressed if it could also run the large existing libraries
>of CISC binaries at full speed, but that would be asking quite a bit
>:-)

  I seem to recall that the high-end HP-PA machines run HP/3000
binaries (under MPE) faster than the HP/3000 series itself ever did.
But I could be wrong.  I don't know if this is done with a full
software emulator, or with a binary->binary translator, etc.

	Anton
   
+---------------------------+------------------+-------------+----------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison | "VMS Forever!" |
+---------------------------+------------------+-------------+----------------+

mike@socrates.umd.edu (mike santangelo (UNIX/VMS Sys Staff)) (05/09/91)

rang@cs.wisc.edu (Anton Rang) writes:

>In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
>>(Does HP-PA do this right now? If so, I am very impressed. I would be
>>much more impressed if it could also run the large existing libraries
>>of CISC binaries at full speed, but that would be asking quite a bit
>>:-)

>  I seem to recall that the high-end HP-PA machines run HP/3000
>binaries (under MPE) faster than the HP/3000 series itself ever did.
>But I could be wrong.  I don't know if this is done with a full
>software emulator, or with a binary->binary translator, etc.

>	Anton
>   
>+---------------------------+------------------+-------------+----------------+
>| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison | "VMS Forever!" |
>+---------------------------+------------------+-------------+----------------+

The HP-PA based HP3000 systems use a very sophisticated emulation
system which makes use of something HP calls "millicode".  These
are tiny little HP-PA based subroutines that they used to emulate
the old HP3000 "classic" design instructions, stack and register
archiecture, etc.  Beleive me, the "classic" HP3000 architecture was
as different from HP PA as different can be.  The fastest "classic" HP3000
system had 128KB of cache in the SPU and ran at a 75ns microinstruction
clock.  I am told that the 67ns Model 935 based HP-PA (1.0) processor will
run emulation-mode for the "classic" binaries just as fast as the
75ns "classic" HP3000 (a Series 70).

We have two old "classic" HP3000 systems and recently purchased
an HP3000/960 (HP-PA) system, roughly 25 MIPS (37ns clock).  When you have
that much computing power, emulation is no joke, it screams.
AND, *EVERYTHING* works.  I was absolutely amazed at the level of
emulation, even some privileged code works (misc sys utilities)!  All normal
application oriented object code starts up and takes off.  We
were absolutely amazed at how flawless it all is.  And we run
some pretty tricky stuff that really made use of the old MPE 5
features specifically (XDS, Message Files), you name a nasty
trick we probably used it (barring priviledge code).

You have to consider though that HP *HAD* to make the emulation
as perfect as possible.  Even the latest versions of MPE XL STILL
USE some of the old object code IN THE OPERATING SYSTEM.  MPE XL 2.05
still used the compatibility mode spooling software from the old
MPE 5 systems for printing, and it worked fine.

And just one other note, compiling our applications software 
(student registration, finance, accounts receivable, all these things
were in COBOL and used extra-data segments out the ying yang along
with several other things along these lines) to "native mode" was
almost as painless, most things re-compiled and ran without any
tweaking.

My hats off to HP for this amazing magic show, it seems to fool
just about EVERYTHING we throw at it from our old "classic" HP3000
systems.

-- 
                                  Mike Santangelo (mike@socrates.umd.edu)
                                  UNIX / VMS Systems Manager

edwardm@hpcuhe.cup.hp.com (Edward McClanahan) (05/14/91)

Anton Rang writes:

> In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
> >(Does HP-PA do this right now? If so, I am very impressed. I would be
> >much more impressed if it could also run the large existing libraries
> >of CISC binaries at full speed, but that would be asking quite a bit
> >:-)

  > I seem to recall that the high-end HP-PA machines run HP/3000
> binaries (under MPE) faster than the HP/3000 series itself ever did.
> But I could be wrong.  I don't know if this is done with a full
> software emulator, or with a binary->binary translator, etc.

Mike Santangelo replies:

> The HP-PA based HP3000 systems use a very sophisticated emulation
> system which makes use of something HP calls "millicode".

Actually, classic-3000 emulation and "millicode" are two completely
different concepts in MPE XL.  Mike has alot of interesting information
in his posting, but let me clarify three points:

1 - Millicode is really just a faster calling sequence for short
    assembly routines.  The best examples are routines which move
    a block of memory (i.e. copy structures) and string functions.
    The compilers are told which registers (and which globals) are
    modified during execution.  For normal procedure calls, the
    optimizer must flush variables/fields held in registers and
    start over after the call.  Millicode calls do not have such
    a detrimental effect on optimization.  In addition, caller-saved
    registers don't necessarily need to be saved prior to the Millicode
    call.

2 - Classic 3000 emulation involves dedicating a large majority of
    the HP-PA (now PA-RISC) general registers to an emulation register
    which is basically a big CASE/SWITCH statement in a LOOP.  Each
    instruction has its own CASE entry to implement the instruction.
    This code is highly optimized and achieves impressive performance.
    A so-called old-timer told me once that the emulated instruction
    set is more complete than any hardware/microcode implementation
    (as well as better documented).

3 - Classic 3000 translation is a further optimization step where the
    loop overhead of the emulator is removed (by unfolding the loop).
    Here is a trivial diagram to explain the difference:

    Suppose emulated machine has two insructions, E1 and E2.
    Suppose native instructions are of the form N1, N2, ...

              Emulator:

                 Load instruction

                 CASE/SWITCH on instruction

                    E1:  N1
                         N2                          
                         <break>

                    E2:  N3
                         N4
                         <break>

                 Go get next instruction

    Suppose classic 3000 program is:

              E1
              E2
              E2

    The translator will convert the classic 3000 program into:

              N1       ; Code for E1
              N2
              N3       ; Code for first E2
              N4
              N3       ; Code for second E2
              N4

    The loop overhead is what is eliminated with the translated program.

I hope this helps...

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

  Edward McClanahan
  Hewlett Packard Company     -or-     edwardm@cup.hp.com
  Mail Stop 42UN
  11000 Wolfe Road                     Phone: (480)447-5651
  Cupertino, CA  95014                 Fax:   (408)447-5039