[comp.sys.nsc.32k] cxp/rxp instructions

cyrus@hi.UUCP (Tait Cyrus) (07/10/87)

Here at the University of New Mexico, we will be starting to port
GENIX 4.2 to a 32016/32032 board.  I have some questions.

1) What good are the cxp/rxp instructions?
2) Why can't the "standard" jsr/ret instructions be used?
3) What advantages are there for going through a jump table instead
   of jumping directly?

The reason I ask is that evern cxp/rxp causes the 32xxx to read
from the mod table and I can see this as slowing things down A LOT.
I have already had one person say that when they ported GENIX 4.1,
they replaced all of the cxp/rxp type instructions with jsr/ret
just to speed things up.

I would appreciate ANY comments/suggestions/ideas.

Thanks in advance

-- 
    @__________@    W. Tait Cyrus   (505) 277-0806
   /|         /|    University of New Mexico
  / |        / |    Dept of EECE - Hypercube Project
 @__|_______@  |    Albuquerque, New Mexico 87131
 |  |       |  |
 |  |  hc   |  |    e-mail:
 |  @.......|..@       cyrus@hc.dspo.gov or
 | /        | /        seismo!unmvax!hi!cyrus
 @/_________@/

jans@tekchips.TEK.COM (Jan Steinman) (07/11/87)

>1) What good are the cxp/rxp instructions?...
>The reason I ask is that evern cxp/rxp causes the 32xxx to read
>from the mod table and I can see this as slowing things down A LOT.

These instructions support shared libraries.  Yes, they are somewhat slower than jsr/ret, but they are MUCH faster than doing shared library calls in software!

Jan Steinman N7JDB - Box 500, MS 50-470 - Beaverton, OR 97077
jans@tekcrl.tek.com - 503/627-5881

collins@encore.UUCP (Jeff Collins) (07/13/87)

In article <10742@hi.UUCP>, cyrus@hi.UUCP (Tait Cyrus) writes:
> Here at the University of New Mexico, we will be starting to port
> GENIX 4.2 to a 32016/32032 board.  I have some questions.
> 
> 1) What good are the cxp/rxp instructions?
> 2) Why can't the "standard" jsr/ret instructions be used?
> 3) What advantages are there for going through a jump table instead
>    of jumping directly?
> 


	In normal operation these instructions should NOT be used.  They are 
much slower than jsr/ret.  We changed our compiler to not generate these 
instruction do to thier execution times.  In fact we modified the OS to 
use the MOD register as infrequently as possible (only on interrupts, where
we have no control over the CPU using it).

	The next National chip (32532) has a direct interrupt mode that does 
not go through the MOD table to find its vector.  We will be using that when
we upgrade...

urip@hcrvx1.UUCP (Uri Postavsky) (07/16/87)

In article <10742@hi.UUCP> cyrus@hi.UUCP (Tait Cyrus) writes:
>
>1) What good are the cxp/rxp instructions?
>2) Why can't the "standard" jsr/ret instructions be used?
>3) What advantages are there for going through a jump table instead
>   of jumping directly?
>

As was mentioned in previous articles, the CXP/RXP are good mainly
for shared libraries and dynamic linking. In the context of UNIX
they are no better than BSR/RET, just slower. The same thing is
true for referencing external variables - the EXT addressing mode
goes through the link table and is slower than the other memory
addressing modes.

*BUT* you cannot just go ahead and replace all the CXP/RXP with BSR/RET.
It depends whether your assembler and linker support BSR across modules. 
In order to support these, the assembler has to generate PC-relative 
addressing mode for all external names (for variables SB-relative is better),
and to generate relocation information to the linker. The linker has to 
"patch" the external references (both procedures and variables) in the
code of each module with the correct memory addresses known only at link time. 
The locations to patch are indicated in the relocation information.

When external references are implemented by the EXT addressing mode only 
(i.e. CXP/RXP), all the references of a module are done through its link
table, so the linker needs only fill this table rather than patch the 
object code itself. Correspondingly, the assembler does not need to
generate relocation information.

Assembler that does not generate relocation information and linker that
can only fill link tables cannot support BSR across modules!!!

If the assembler and linker you have are from National, you can tell which
kind of assembler you have by the directives for external names.

The old tools use the CXP/RXP and the directives for external names are:
.export/.import for variables and .exportp/.importp for procedures.
If you have these tools, you cannot use BSR across modules.

The new tools use BSR/RET and the directive for external names is: .global.


-- 
Uri Postavsky (  ...{utzoo, utcsri}!hcr!urip )
	  at HCR, Toronto.
(formerly at National Semiconductor Tel Aviv).

elg@killer.UUCP (Eric Green) (07/17/87)

in article <1751@encore.UUCP>, collins@encore.UUCP (Jeff Collins) says:
> 
> In article <10742@hi.UUCP>, cyrus@hi.UUCP (Tait Cyrus) writes:
>> Here at the University of New Mexico, we will be starting to port
>> GENIX 4.2 to a 32016/32032 board.  I have some questions.
>> 
>> 1) What good are the cxp/rxp instructions?
>> 2) Why can't the "standard" jsr/ret instructions be used?
>> 3) What advantages are there for going through a jump table instead
>>    of jumping directly?

Well, for one thing, it might make relocatable shared librarys easier.  For
example, on the Amiga, to access a shared library, you must first issue an
"openlibrary" command (with the name of the library), which returns you the
address of the start of the library's jump table. Still, an indirect indexed
jsr probably would be faster....
   Of course, libraries would have to consist solely of relocatable code in
order to work such a scheme... or else, relocate them when loaded and map them
into every process space (and un-map them when they're not being used --
on your typical system with a 32-bit address space, it's trivial to dedicate
half of that address space to the kernal and shared libraries). But boy,
wouldn't that make for some back doors! (someone loading in their own "stdio"
library :-). Seems a shame to restrict library-loading to the "standard"
libraries.

 In any event, it wouldn't be difficult to come up with a better
shared-library system than Sys V.3 uses. Like, my 12 year old brother could
probably do better :-).

  Eric Green {ihnp4,cbosgd}!killer!elg elg@usl.CSNET

mjd@doc.ic.ac.uk (Martin J Davies) (07/19/87)

>
>>1) What good are the cxp/rxp instructions?...
>>The reason I ask is that evern cxp/rxp causes the 32xxx to read
>>from the mod table and I can see this as slowing things down A LOT.
>
>These instructions support shared libraries.  Yes, they are somewhat slower than jsr/ret, but they are MUCH faster than doing shared library calls in software!

	I have implemented shared 'C' libraries on a 32016 machine running my
own multi-user/multitasking o/s. I started using the RXP/CXP instructions
and this went quite well but there was a speed penalty. I found it much faster
to dynamicly link the library calls using an interrupt linkage handler. How
this could be applied to unix I am not sure, but modules do seem to use a
noticable amount of cpu time.

Posted on for P.Winterbottom Kings College London (Gemini Project)

greg@utcsri.UUCP (07/25/87)

>>>1) What good are the cxp/rxp instructions?...
>>>The reason I ask is that evern cxp/rxp causes the 32xxx to read
>>>from the mod table and I can see this as slowing things down A LOT.
>>
>>These instructions support shared libraries.  Yes, they are somewhat
>>slower than jsr/ret, but they are MUCH faster than doing shared library 
>> calls in software!
>
I haven't seen the other advantage of cxp/rxp yet, so I'll bring it up.

Cxp/rxp ops allow a program to be divided into a number of modules, each
having its own static data. When cxp is used to call a procedure in a different
module, the sb register is reloaded to point to the data of that module.
This static data can be accessed by indexing from the sb register, and the
less data there is, the smaller the indices will be on average. The NS32k
allows three different sizes of index; thus smaller indices mean smaller
and somewhat faster code.
  If cxp is not used, all of the static data for the whole program will be
lumped together, and must be addressed using absolute addressing or by
potentially large offsets from the sb. Note that the external data addressing
mode ( which is time-consuming ) need not be used to access data in another
module; you can use absolute addressing in this case ( not if the other
module is a shared resident library of course ).
 Subroutines which can be called externally must end in rxp, and thus must
be called via cxp, even when called from within the same module. In order
to reduce this effect, the compiler should be able to determine which routines
cannot be externally called, and make them 'jsr/rts' routines.
Unfortunately, in C, the only routines which cannot be called externally are
those decalared 'static', and this declaration is rarely used.
( I am assuming one foo.c file compiles to a single ns32k module, which is
the logical way to do it ).
Languages such as Concurrent Euclid, which directly support the 'module'
paridigm, can make much better use of the cxp/rxp instructions.

Finally, if each object module is an ns32k module ( and if the external
data addressing mode is used ), linking of object modules can be done
VERY cheaply ( i.e. without modifying any of the program text segment ).
This is similar to the way resident shared libraries are done - a RSL
is effectively linked to the program at load time.
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...