joel@fps.com (Joel Broude) (02/21/90)
I have a manual that describes each scalar processor instruction for the FPS Model 500, including djibzm. Many of these instructions are the same as they were for the Celerity machines. This manual contains proprietary information, and is not intended for general distribution, but if you have a specific question, I may be able to help. And if you have a compelling need for a copy of the manual, communicate that need to me in writing, and I'll find out whether I can release a copy to you.
ps@fps.com (Patricia Shanahan) (02/22/90)
In article <6994@celit.fps.com> joel@fps.com (Joel Broude) writes: >I have a manual that describes each scalar processor >instruction for the FPS Model 500, including djibzm. Many >of these instructions are the same as they were for the >Celerity machines. > >This manual contains proprietary information, and >is not intended for general distribution, >but if you have a specific question, I may be able to >help. And if you have a compelling need for a copy >of the manual, communicate that need to me in writing, >and I'll find out whether I can release a copy to you. I think it may be better if the customers take the issue up with the local field offices. Personally, I think that a document such as the one Joel has should be generally available. The only thing that will cause that to happen is if customers ask for it. Be careful with any attempt to use the NCR descriptions. There are several commands that NCR supports that we deliberately did not put in the Celerity assembler and do not support on the Model 500. Additionally, the NCR documents will not tell you anything about how to do floating point arithmetic, integer multiplication, stack cache register access, or non-local branches. Most of the off-chip functions are very different in a Celerity or FPS M500 than NCR intended. However, for those commands that exist with the same meaning in both instruction sets we used the same mnemonic as NCR. If you are doing a gcc port, would you implement a new assembler or use the existing one? If you implement a new assembler, there is another set of information that you need on some non-interlocked pipeline conflicts. The assembler inserts nop (actually "tw r0,r0") when not optimizing, or re-arranges code when optimizing, to prevent some restricted sequences from occuring. For example, on a Celerity (but not a Model 500) modifying the data register of a store on the immediately following cycle may give undefined results if the store page faults. You also need to decide whether to write your own calling conventions or use ours. If you use ours, your code will be able to call and be called by FPS compiled code. Even if you do your own, make sure that r14 contains the address of the end of the memory stack. When calling signal handlers the kernel uses the area below where r14 points. Some of the instructions are obscure, including djibzm. The compiler does not actually generate these, nor do programmers (even on the rare occasions when we resort to assembly language programming) normally use them. The assembler has a set of macros with names like "djneq" for delayed jump on inequality, that are implemented by the assembler using them. Similarly, we have an assembler mnemonic "load ra,lit" which means "load general register ra with literal lit by any suitable method". The assembler generates a sequence that takes one to five cycles on a Celerity, or one to four cycles on a FPS M500 depending on the value of "lit". -- Patricia Shanahan ps@fps.com uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps phone: (619) 271-9940
scp@acl.lanl.gov (Stephen C. Pope) (02/23/90)
on 21 Feb 90 16:31:37 GMT, ps@fps.com (Patricia Shanahan) said: [...] Patricia> If you are doing a gcc port, would you implement a new assembler or use Patricia> the existing one? As I mentioned the gcc port, I should make it clear that I haven't actually tried to do this. It's not clear to me how interested I am in paleantology! The main gripe is that it'll be a while before FPS delivers a cc/cpp/ccpass1 that uses dynamically allocated tables instead of fixed size ones, and I don't know whether we'll ever see an ANSI compiler that does voids and enums right. For me, a main reason is to get g++ going. Patricia> If you implement a new assembler, there is another set Patricia> of information that you need on some non-interlocked pipeline conflicts. Patricia> The assembler inserts nop (actually "tw r0,r0") when not optimizing, or Patricia> re-arranges code when optimizing, to prevent some restricted sequences from Patricia> occuring. For example, on a Celerity (but not a Model 500) modifying the data Patricia> register of a store on the immediately following cycle may give undefined Patricia> results if the store page faults. The situation is in some ways analogous to the mips situation: with a smart assembler, you might as well let the assembler do the instruction scheduling, but you might lose over what a really smart gcc could do. Alas, instruction scheduling is not really there yet in gcc. There's also the question of symbol management with the FPS as; I've not even looked to see whether g++ will work in concert with as. Patricia> You also need to decide whether to write your own calling conventions or use Patricia> ours. If you use ours, your code will be able to call and be called by Patricia> FPS compiled code. Even if you do your own, make sure that r14 contains the Patricia> address of the end of the memory stack. When calling signal handlers the Patricia> kernel uses the area below where r14 points. Patricia> Some of the instructions are obscure, including djibzm. The compiler does Patricia> not actually generate these, nor do programmers (even on the rare occasions Patricia> when we resort to assembly language programming) normally use them. The Patricia> assembler has a set of macros with names like "djneq" for delayed jump on Patricia> inequality, that are implemented by the assembler using them. Similarly, we Patricia> have an assembler mnemonic "load ra,lit" which means "load general register Patricia> ra with literal lit by any suitable method". The assembler generates a Patricia> sequence that takes one to five cycles on a Celerity, or one to four cycles Patricia> on a FPS M500 depending on the value of "lit". All this non-obvious (but welcomed!) info is exactly why there are users out here very interested in the free flow of information! stephen pope advanced computing lab, lanl scp@acl.lanl.gov