[comp.sys.celerity] Model 500 Scalar Processor Instructions

joel@fps.com (Joel Broude) (02/21/90)

I have a manual that describes each scalar processor 
instruction for the FPS Model 500, including djibzm.  Many
of these instructions are the same as they were for the
Celerity machines.  

This manual contains proprietary information, and
is not intended for general distribution, 
but if you have a specific question, I may be able to 
help.  And if you have a compelling need for a copy
of the manual, communicate that need to me in writing, 
and I'll find out whether I can release a copy to you.

ps@fps.com (Patricia Shanahan) (02/22/90)

In article <6994@celit.fps.com> joel@fps.com (Joel Broude) writes:
>I have a manual that describes each scalar processor 
>instruction for the FPS Model 500, including djibzm.  Many
>of these instructions are the same as they were for the
>Celerity machines.  
>
>This manual contains proprietary information, and
>is not intended for general distribution, 
>but if you have a specific question, I may be able to 
>help.  And if you have a compelling need for a copy
>of the manual, communicate that need to me in writing, 
>and I'll find out whether I can release a copy to you.


I think it may be better if the customers take the issue up with the local
field offices. Personally, I think that a document such as the one Joel has
should be generally available. The only thing that will cause that to
happen is if customers ask for it.

Be careful with any attempt to use the NCR descriptions. There are several
commands that NCR supports that we deliberately did not put in the Celerity
assembler and do not support on the Model 500. Additionally, the NCR 
documents will not tell you anything about how to do floating point
arithmetic, integer multiplication, stack cache register access, or non-local
branches. Most of the off-chip functions are very different in a Celerity or 
FPS M500 than NCR intended. However, for those commands that exist with the 
same meaning in both instruction sets we used the same mnemonic as NCR.

If you are doing a gcc port, would you implement a new assembler or use
the existing one? If you implement a new assembler, there is another set
of information that you need on some non-interlocked pipeline conflicts.
The assembler inserts nop (actually "tw r0,r0") when not optimizing, or
re-arranges code when optimizing, to prevent some restricted sequences from
occuring. For example, on a Celerity (but not a Model 500) modifying the data
register of a store on the immediately following cycle may give undefined
results if the store page faults.

You also need to decide whether to write your own calling conventions or use
ours. If you use ours, your code will be able to call and be called by
FPS compiled code. Even if you do your own, make sure that r14 contains the
address of the end of the memory stack. When calling signal handlers the
kernel uses the area below where r14 points.

Some of the instructions are obscure, including djibzm. The compiler does
not actually generate these, nor do programmers (even on the rare occasions
when we resort to assembly language programming) normally use them. The
assembler has a set of macros with names like "djneq" for delayed jump on
inequality, that are implemented by the assembler using them. Similarly, we
have an assembler mnemonic "load ra,lit" which means "load general register
ra with literal lit by any suitable method". The assembler generates a 
sequence that takes one to five cycles on a Celerity, or one to four cycles
on a FPS M500 depending on the value of "lit".
--
	Patricia Shanahan
	ps@fps.com
        uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
	phone: (619) 271-9940

scp@acl.lanl.gov (Stephen C. Pope) (02/23/90)

on 21 Feb 90 16:31:37 GMT,
ps@fps.com (Patricia Shanahan) said:

[...]

Patricia> If you are doing a gcc port, would you implement a new assembler or use
Patricia> the existing one?

As I mentioned the gcc port, I should make it clear that I haven't actually
tried to do this.  It's not clear to me how interested I am in paleantology!
The main gripe is that it'll be a while before FPS delivers a cc/cpp/ccpass1
that uses dynamically allocated tables instead of fixed size ones, and I
don't know whether we'll ever see an ANSI compiler that does voids and enums
right.  For me, a main reason is to get g++ going.

Patricia> If you implement a new assembler, there is another set
Patricia> of information that you need on some non-interlocked pipeline conflicts.
Patricia> The assembler inserts nop (actually "tw r0,r0") when not optimizing, or
Patricia> re-arranges code when optimizing, to prevent some restricted sequences from
Patricia> occuring. For example, on a Celerity (but not a Model 500) modifying the data
Patricia> register of a store on the immediately following cycle may give undefined
Patricia> results if the store page faults.

The situation is in some ways analogous to the mips situation: with a
smart assembler, you might as well let the assembler do the instruction
scheduling, but you might lose over what a really smart gcc could do.
Alas, instruction scheduling is not really there yet in gcc.  There's
also the question of symbol management with the FPS as;  I've not
even looked to see whether g++ will work in concert with as.

Patricia> You also need to decide whether to write your own calling conventions or use
Patricia> ours. If you use ours, your code will be able to call and be called by
Patricia> FPS compiled code. Even if you do your own, make sure that r14 contains the
Patricia> address of the end of the memory stack. When calling signal handlers the
Patricia> kernel uses the area below where r14 points.

Patricia> Some of the instructions are obscure, including djibzm. The compiler does
Patricia> not actually generate these, nor do programmers (even on the rare occasions
Patricia> when we resort to assembly language programming) normally use them. The
Patricia> assembler has a set of macros with names like "djneq" for delayed jump on
Patricia> inequality, that are implemented by the assembler using them. Similarly, we
Patricia> have an assembler mnemonic "load ra,lit" which means "load general register
Patricia> ra with literal lit by any suitable method". The assembler generates a 
Patricia> sequence that takes one to five cycles on a Celerity, or one to four cycles
Patricia> on a FPS M500 depending on the value of "lit".

All this non-obvious (but welcomed!) info is exactly why there are
users out here very interested in the free flow of information!

stephen pope
advanced computing lab, lanl
scp@acl.lanl.gov