[comp.sys.handhelds] HP-48 Emulator

gt1246c@prism.gatech.EDU (Warren Furlow) (03/07/90)

Alonzo Gariepy writes:
>You could build a Saturn emulator that would blow the socks off the 
>HP28.  They call it a 1MHz machine, but that doesn't say much about 
>the instruction timing.  Someone recently claimed that the 28 has a 
>one bit memory bus, which I am inclined to believe.  At best, it has 
>a 4 bit memory bus and a 4 bit register bus.  With a 16 bit 286 running 
>at 16 MHz, you would have 64 times the speed.  That is easily enough 
>to emulate the Saturn at several times the speed of the 28.

The emulator I wrote for the HP-41 actually runs slightly slower than 
the real thing on a 12Mhz AT.  It is all written in Microsoft C and
well optimized.  I think it could be sped up by rewriting critical
routines in assembly, but I don't see anything near 64 times the speed.
(The HP-41 runs much slower than the Saturn, I forget exactly how much)
The real problem seems to be that the Saturn and HP-41 processors use
nybble field arithmetic and this must be emulated by using loops.  For 
instance the following instruction adds the value of the A register to the C
from the pointer (PT) through the end of the 56 bit word in the HP-41:

C=C+A WPT  translates to something like:

for (i=PT;i<14;i++)
  {
  C_reg[i] += A_reg[i];
  }

Where:
char C_reg[14],A_reg[14];      //HP-41 C reg is 56 bits
int PT;                        //0..13

This is a simplification and does not consider the carry but it is easy to see 
how slow it is.

Another way to do this that Ross Cooling of Canada implemented in an HP-41
emulator he wrote is to use double (64 bit) variables:

double C_reg,A_reg;

which requires only several masking operations and one addition.

This is definitly faster but a seperate routine must be written for EVERY 
TEF case for EVERY arithemtic instruction and this takes ALOT of code. 

I can't think of any other way or accectible combination that would be
worthwile to implement.  This all relates to the HP-48 since it uses the 
Saturn processor and now Alonzo is interested in writing a Saturn emulator.

Peter Holzer writes:
>The main reason for not writing a CPU emulator (I considered it before 
>I started) is that either every user would have to load the entire ROM
>of the HP28 (128k!) into his PC (not everybody has an IR receiver on 
>his PC) or the ROM had to be distributed with the emulator (I am not
>sure if HP would like this).

I don't think HP would like that at all and now they have expliclicitly said 
so in the HP-48 copyright notice.  But with the HP-48, it is now very easy 
for each individual to download the ROMs to a PC.  I don't think HP can or 
will say anything about the downloading of their ROMs.  Futhermore, the 
can't say anything if someone writes a disassembler that produces listings
from the ROM images as long as those listing are not distributed.  (I think
HP does have the right to prevent that).  If I am wrong, HP speak up now...
This disassembler could be smart and even include comments and labels.

I interested in seeing if Peter is going to change to the HP-48.  I think
an RPL emulator for the HP-48 would be a big success since it would allow 
RPL program development on a large screen with a real keyboard.

I have been planning for some time to write development tools for the first 
new HP calculator with I/O.  This would include a disassembler, assembler, 
linker and ultimately a Saturn emulator and be very much like the ASDT package
I did for the HP-41.  The speed of the emulator and the usage of the ROMs are
major concerns so maybe someone else has some ideas.  Also, I am interested to
know how much interest there will be in HP-48 machine code.

Warren Furlow

alonzo@microsoft.UUCP (Alonzo GARIEPY) (03/09/90)

> It might help to compare apples to apples...

A good point.  I will have to do actual instruction timings to confirm
the relative speed of the Saturn and x86 CPUs.  As was pointed out, it
is difficult to tell exactly what is meant by these MHz numbers.  If 
the Saturn does have a one bit memory bus, 1 MHz is very slow.  Perhaps
someone with all the facts can explain the CPU/Memory architecture of
the Saturn.

Since I have a 25MHz 80386 machine on my desk, the performance on a
286 doesn't concern me much anyway.  Intel is selling 386s below the 
286 price so that the AT can become a thing of the past and 32 bit
OS/2 can rule the land.

Since my emulator would by asynchronous (would emulate each instruction
as fast as possible) only average case performance is important, and parts 
of the ROM that rely on timing would need adjustment.

One of the tricks for improving average performance is coding commonly
called subroutines and idioms (e.g., 142, 164, 808C) directly in x86 
assembler.  Such tricks abound.  I would expect a 2 or 3 times speed
improvement in assembler over C even before applying tricks.

Alonzo Gariepy
alonzo@microsoft

madler@tybalt.caltech.edu (Mark Adler) (03/10/90)

The HP-28C/S use a four bit multiplexed address/data bus.  Addresses are
not sent very often (the memory chips store and autoincrement the 20-bit
address), so most of the cycles are four bit data transfers.

Mark Adler
madler@hamlet.caltech.edu

randys@hpcvra.CV.HP.COM (Randy Stockberger) (03/11/90)

With the current discussion about emulating the Saturn CPU on a PC I
was curious about what some of the problems might be in doing that.
Since the PC CPU and the Saturn CPU have slightly different
architectures I feel it is safe to assume that there will be some
subtle problems in the emulation.  I have not given the task enough
consideration to understand what some of these problems might be.
However, I did code up a couple of instructions which I arbitrarily
decided are representative of the task of emulating Saturn.

First of all I made some assumptions about how to structure the
program.  I assumed that the 64 bit CPU registers are encoded as one
nibble per byte, low nibble in low memory.  This allows the field
select instructions to operate without having to pack and unpack
nibbles.  The downside of this is that byte aligned BCD instructions
have to be carried out a nibble at a time.  I think that if a packed
nibble format were used then byte aligned BCD operations could work on
two nibble per loop, but the odd nibbles would cause considerable
trouble.  Also, if the nibbles were packed it would be easier to do
non-BCD operations.

For organizing memory I made the opposite decision, nibbles are
packed, two per byte.  Using this organization is is possible to
squeeze the entire MegaNibble of Saturn address space into a DOS
program, but probably does not leave enough extra space to accomodate
the Saturn emulator code.  I have some real doubts about being able to
emulate a complete Saturn and memory space on an 8086/80286 class CPU.

I also assume that the program counter is just a word.  This is just
for simplicity and is really not an acceptable design decision.  In a
production quality emulator you would have to design a program counter
data structure that would be efficient to manipulate, and would work
with the 8086 segmented architecture.

If the emulator were going to run on a 386 in protected mode, e.g.
with a large directly addressable memory space, it would probably be
best to organize memory with one nibble per byte.  Also, on a machine
with 32 bit words, programming the PC, D0 and D1 registers would be
much easier.

Now, for a couple of code samples.  The first instruction was selected
to represent the arithmetic portion of the instruction set.

; Decimal mode instruction 'A=A+C   W'
	mov	DI,offset (RegA-1) ; 2  Set up pointers to source and
	mov	SI,offset (RegC-1) ; 2  destination registers.
	mov	CX,16		; 2  Count of the number of nibbles to add.
	clc			; 2  Make sure carry is OK at the start.
AddAC:
	inc	si		; 2
	inc	di		; 2
	mov	al,[si]		; 4  Fetch the first operand
	adc	al,[di]		; 6  Add in the second operand
	aaa			; 4  Make it a BCD addition
	mov	[di],al		; 2  Store the result
	loop	AddAC		; 11+m  Loop back for the next nibble
	mov byte ptr RegCarry,0 ; 2     Assume we had no carry
	jnc	DoneCarry	; 11+m/3  Assumption was right
	dec byte ptr RegCarry	; 6     Assumption was wrong, set carry
DoneCarry:
	jmp	NextInstruction	; 7+m/3

; Execution time is ( 31 * 16 ) + 17  =  496 + 28  =  524 cycles.
; Assuming a 20 MHZ machine this is 26.2 micro seconds.  On a 48SX this
; instruction takes 17 cycles (maybe 18, I don't have the docs here) and
; 17 cycles at 2 MHZ is 8.5 us.  A ratio of .324, the emulator is about
; 1/3 as fast.

; I selected the GOC instruction since I figured it would be one of the
; easiest and most efficient to emulate.  Again, if I were writing a
; production quality emulator the program counter would have to reach the
; entire Saturn address space and this code would be more complicated.
;
; GOC instruction.  Assumes BX == PC?? ( 16 bits == 20 bits ??? )
	cmp	RegCarry,0	; 5      See if there is any thing to do
	jz	NextExit	; 7+m/3  No, easy out.

	mov	si,bx		; 2
	shr	si,1		; 3
	mov	al,[si]		; 4  Fetch low nibble
	jc	OddPC		; 7+m/3
EvenPC:	mov	ah,al		; 2  Save second nibble in ah.
	mov	cl,4		; 4
	shr	al,cl		; 7  Shift high nibble to low.
	mov	cl,4		; 4
	shl	ah,cl		; 7
	or	al,ah		; 2  al == offset for the GOC.
	jmp short AddOffset	; 7+m

OddPC:	and	al,0Fh		; 2  Mask off low nibble.
	mov	ah,[si+1]	; 4  Fetch next nibble.
	and	ah,0F0h		; 2
	or	al,ah		; 2
AddOffset:
	mov	ah,0		; 4
	add	bx,ax		; 2
NextExit:
	jmp	NextInstruction	; 7+m

; If I remember correctly a Saturn CPU will execute the GOC instruction
; in either 3 (if the PC is not changed) or 5 cycles.  This is an
; execution time of 1.5/2.5 us.

; No Carry time == 12 cycles == 0.6 us   Ratio: 2.5
; OddPC time    == 40 cycles == 2.0 us   Ratio: 1.25
; EvenPC time   == 66 cycles == 3.3 us   Ratio: 0.757

Now, what does all this mean?  It means that on an average 20MHZ 386
under the limitations imposed by DOS will execute some of the emulated
instructions slightly faster, and and most of them a little slower
than the 48SX.  I estimate the average emulation speed would be less
than 1/2 as fast.  The exact speed would, of course, depend on what
instructions were in the program being executed.

Is it fast enough?  That is up to the person who uses the program.

Could it be faster?  Probably not under DOS.  There are problems that
I glossed over, ignored or haven't even found yet that would almost
certainly make it even slower.

Given a 32 bit CPU like a 68000 or a 80386 and an operating system
which allows perhaps a megabyte and a half or more for the necessary
data structures and code space without having to worry about segment
registers it would probably be faster.  However, what percentage of us
are running UNIX on a 68000 with all the freedom and access that we
have with our PCs?  Or would the 386 and Xenix be a better choice for
the host environment?

A generic 4.77 MHZ PC with an 8088 CPU is probably about 1/10th the
speed of the 20MHZ 386 and would emulate at 1/20 to 1/30 the Saturn
speed.  An 8 or 12 MHZ 80286 would, of course, be somewhere in
between.  I suspect that these would not be acceptable for most of us.

--
  Randy Stockberger
  randys@hp-pcd.hp.com
  Ma Bell:   750-3589
--

jmunkki@kampi.hut.fi (Juri Munkki) (03/11/90)

randys@hpcvra.CV.HP.COM (Randy Stockberger) writes:
>Given a 32 bit CPU like a 68000 or a 80386 and an operating system
>which allows perhaps a megabyte and a half or more for the necessary
>data structures and code space without having to worry about segment
>registers it would probably be faster.  However, what percentage of us
>are running UNIX on a 68000 with all the freedom and access that we
>have with our PCs?  Or would the 386 and Xenix be a better choice for
>the host environment?

First of all, let me say that I don't think that speed is all that
important with the emulator. Given the strange architecture of the
Saturn processor, I don't think that we even need the emulator. It
might be best to write a debugger that uses the serial line. The
original Macintosh debugger used two macs, where one executed the
code and the other displayed windows with registers and dissassembly.

There are a few basic blocks that need to be written with portability
in mind:

	1) A macro-assembler.
	2) A dissassembler that understands symbol tables from the
	   assembler. Someone could write symbol tables for the
	   most common sections of the ROM.

I don't know if the Saturn processor directly supports a trace or
step mode, but one could probably be emulated with some code that
looks out for branch instructions and emulates those. Breakpoints
should be easy enough to set in RAM-based code.

This step/trace unit would then communicate (using the serial port)
with the main computer. If you use the supplied kermit protocol
packets, this system could be extremely portable (and probably slow
too.)

After these three (separately usable) blocks have been written, some
machine-dependent code could be written to glue them together. At
this point, I would probably write a Macintosh interface for the whole
assembler/debugging system and someone else could write a PC or unix
interface. All this assumes that the assembler and disassembler are
written in a high level language. (If you use C, please allow for
16 or 32 bit integers.)

The 68020 (or 30) would probably be quite well suited for the emulator,
since it has bcd arithmetic and bit-field instructions. The reverse
nibble addressing scheme might best be emulated by negating every bit
field address before use. This would require some further adjustment,
but you could move directly to and from memory with very little overhead.
Once the bitfield address is set up, reading the whole 64 bit word would
take only two or three instructions. (A 16Mhz 68020 with 2 wait states
can emulate a 4.77Mhz 8088.)

The emulator requires a lot of work and still isn't quite as useful
as the assembler+disassembler+trace+user interface combination that
I described.

I'd like to write some arcade games for the 48SX. Tetris will probably
be the first. With the IR link, it could be interesting as a two player
game.

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
|     Juri Munkki jmunkki@hut.fi  jmunkki@fingate.bitnet        I Want   Ne   |
|     Helsinki University of Technology Computing Centre        My Own   XT   |
^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^

inst182@tuvie (Inst.f.Techn.Informatik) (03/13/90)

In article <6724@hydra.gatech.EDU> gt1246c@prism.gatech.EDU (Warren Furlow) writes:
>
>I interested in seeing if Peter is going to change to the HP-48.  I think
>an RPL emulator for the HP-48 would be a big success since it would allow 
>RPL program development on a large screen with a real keyboard.
>
>I have been planning for some time to write development tools for the first 
>new HP calculator with I/O.  This would include a disassembler, assembler, 
>linker and ultimately a Saturn emulator and be very much like the ASDT package
>I did for the HP-41.  The speed of the emulator and the usage of the ROMs are
>major concerns so maybe someone else has some ideas.  Also, I am interested to
>know how much interest there will be in HP-48 machine code.
>
>Warren Furlow

Yes, I am changing to the HP-48. I was at a local computer shop
this morning, and they said, they will get it this week. 

So by next week, I will (hopefully) have one, and the HP28
emulator will turn into an HP48 emulator. 

By the way, I have posted the emulator to comp.sources.misc in
the mean time. It compiles with Turbo-C on DOS and with gcc on
UNIX-machines, there is almost no machine-dependent code in it,
so you might have a look on it, even if you have a different 
environment.

I am looking forward to your comments.

I am certainly not going to write a Saturn emulator in the next
time. I am not interested enough in machine-language programming
on the HP to invest much time into it.

Peter
 _______________________________________________________________
|	__   | Peter J. Holzer					|
|  |   |  \  | Technische Universitaet Wien			|
|  |___|__/  | 							|
|  |   |     | hp@honey.tuwien.ac.at	hp@vmars.uucp		|
|  |   |     | ...!uunet!mcsun!tuvie!asupa!honey!hp		|
|  ____/     |--------------------------------------------------|
|	     | Think of it as evolution in action -- Tony Rand	|
|____________|__________________________________________________|