[comp.sys.nsc.32k] 32532 info ...

bcd@psueclb.BITNET (08/23/87)

at the request of many, here is the info that I have on the 32532.  most is
from data sheet, some from talking with people at national.  apparently the
chip has been designed and manufactured (to this point) solely in israel at
national's plant there.  I gather that there are very few actual parts here.

physical information
====================

at this time I believe that the only package available (which isn't actually
available) is a 175 pin grid array.  this is in a 16x16 matrix with the
bottom right corner pin missing.  the inside array that doesn't have pins
is 8x10 (or 10x8 depending upon your orientation).  the address and data
bus are obviously not multiplexed.

the basic bus cycle is 2 clocks.  also they have something called "burst"
mode, which will automatically read the 16-byte aligned block surrounding
an instruction access in 32-bit/clock accesses.  this is done to stream
line the loading of the on-chip caches, and with a good memory system
design could result in considerable performance enhancements without
faster memory.

the '532 "will be available" in 20, 25, and 30MHz versions with a 40MHz
some time down the road (like a year or more).  there are supposely chips
running at this very moment, and "unix will be up soon."

memory management
=================

the '532 has an on-chip MMU which is somewhat software compatible with the
'332/'382 combination.  actually, it is entirely software compatible
(according to the data sheet) save the debugging registers.  the '532 does
have debugging registers, but they are not the same as on the '382.  "if an
attempt is made to execute an LMR or SMR instruction to one of these 4
register (BAR, BMR, BDR, and BEAR), the trap(UND) occurs."  additionally,
some bits on the MCR which control debugging features of the '382 are not
implemented on the '532.

the on-chip address translation buffer is fully associative with 64 entries.
in general, the translation algorithm seems to have been improved on the
'532.  non-interlocked memory accesses are done whenever possible (this
is not true of the '382, which apparently uses interlocked bus cycles rather
superflously).

memory mapped i/o
=================

an interesting part of '532 is its handling of memory mapped i/o (which, by
the way, is the only type of i/o it has).  due to the extreme amount of
cache'ing done on the chip, special steps must be taken to assure that
i/o devices are handled correctly.  national calls the classifies the
characteristics of i/o devices as "destructive-reading" and "side-effects
of writing."

destructive-reading means that reading from an i/o port can cause bits to
be reset etc. in the device's registers.  due to the instruction pipeline
design of the '532 it is possible for souce operands for one instruction to
be read while or before the previous instruction is executing.  thus is it
obviously a requirement that destructive-reading of source operands in
in advance of executing the instruction be avoided.

side-effects of writing are a similar story.  "for example, before reading
the counter's value from the 32202 interrupt controller unit it is first
necessary to freeze the value by writing to another control register.
... the '532 can read the source operands for the second instruction before
writing the results of the previous instruction.  consequently, it is a
requirement that read and write references to peripheral that exhibit
side-effects of writing must occur in the order that instructions are
executed."

the '532 provides two methods of solving these problems.  the first method
only solved the side-effects of writing problem, but requires no additional
external hardware.  this is accomplished by placing the i/o devices in a
special range of memory specifically designed to memory-mapped i/o devices
(FF000000 - FFFFFFFF).

the second method solves both of the above problems but requires some
additional external decoding hardware.  "when the '532 generates a read bus
cycle, it asserts the output signal IOINH' if either of the i/o requirements
listed above are not satisfied.  when the reference is to a peripheral device
that implements ports with destructive-reading or side-effects of writing,
the input signal IODEC' must be asserted.  in addition, the device must
not be selected if IOINH' is active."

self-modifying code
===================

"the series 32000 architecture does not specify the results produced by
executing a program that modifies itself.  nevertheless, on the '332 and
previous microprocessors in the family it was possible to execute
self-modifying code according to the following sequence:

        1. modify the appropriate instruction.
        2. execute a JUMP instruction or other instruction that causes
           the microprocessor's instruction queue to be flushed.
        3. execute the modified instruction.

for example, the ineteractive debugger may follow the sequence above after
reaching a breakpoint in a program being monitored.

the same program may not produce identical results when execute on the '532
due to the effects of the instruction cache and branch prediction.  in order
to execute self-modifying code on the '532 it is necessary to do the
following:

        1. modify the appropriate instruction.
        2. in the modified instruction is on a cacheable page, execute
           CINV to invalidate the contents of the instruction cacha; otherwise
           execute an instruction that causes a serializing operation.
        3. execute the modified instruction."

================================================================================

I tried to cover most of the differences between the '332 and the '532.  if
you have any specific questions ...

I never did receive any info on software for the 32000 series.  I am primarily
interested in a good optimizing C compiler that I can port to the operating
system I am writing.  thanks in advance.

------------
                                        Bryan Davis (BCD@PSUECL)
                                        Engineering Computer Laboratory
                                        Pennsylvania State University
     

grenley@nsc.nsc.com (George Grenley) (08/27/87)

Many thanks to Bryan Davis for taking the time and trouble to post 
this info about the '532.  I will try to clarify a few of the 
still-open questions.


In article <794@PSUECLB> bcd@psueclb.BITNET writes:
>apparently the
>chip has been designed and manufactured (to this point) solely in israel at
>national's plant there.  I gather that there are very few actual parts here.

This is correct, but could be misleading.  NSC has a chip design facility, and
a fab line, in Israel.  I have visited them, and they are very good - the best
we have, in my opinion.

The reason there are not too many parts here in the US yet is that we are
still characterizing it in Israel.  We don't need parts here yet, the system
design work is also being done in Israel - I know, I'm the guy managing it.

>at this time I believe that the only package available (which isn't actually
>available) is a 175 pin grid array.  this is in a 16x16 matrix with the
 ^^^^^^^^^
         Actually, the package is more available than the die... 8-)

	 but that's right, the package is a 175 pin PGA - we recommend ZIF
	 sockets for prototyping, BTW.

>bottom right corner pin missing.  the inside array that doesn't have pins
>is 8x10 (or 10x8 depending upon your orientation).  the address and data
>bus are obviously not multiplexed.
>
>the basic bus cycle is 2 clocks.  also they have something called "burst"
>mode, which will automatically read the 16-byte aligned block surrounding
>an instruction access in 32-bit/clock accesses.  this is done to stream
               ^ or data
>line the loading of the on-chip caches, and with a good memory system
>design could result in considerable performance enhancements without
>faster memory.
 
This is correct - that part loads its internal caches 4 DW at a time, so
you see lots of burst reads.  The first read is two clocks, the next three
are 1 clock each, although you can add wait states if you want.

>the '532 "will be available" in 20, 25, and 30MHz versions with a 40MHz
>some time down the road (like a year or more).  there are supposely chips
>running at this very moment, and "unix will be up soon."

There ARE chips running in boards right now, I have one here in my lab.
We showed it to the press a couple of weeks ago, you should be seeing
some ink on it soon (I hope - that's why we had the press tour!)
Marketing has promised samples late this year (ho-ho-ho Merry Christmas!)
and I know from an engineering point of view that this is realistic.
First samples will be 20 mhz officially - you might want to bug our
marketing people about this, it's their call.  I just build boards.....

>memory management
 
>the '532 has an on-chip MMU which is somewhat software compatible with the
>'332/'382 combination.  actually, it is entirely software compatible
>(according to the data sheet) save the debugging registers.  the '532 does
>have debugging registers, but they are not the same as on the '382.  "if an
>attempt is made to execute an LMR or SMR instruction to one of these 4
>register (BAR, BMR, BDR, and BEAR), the trap(UND) occurs."  additionally,
>some bits on the MCR which control debugging features of the '382 are not
>implemented on the '532.

>the on-chip address translation buffer is fully associative with 64 entries.
>in general, the translation algorithm seems to have been improved on the
>'532.  non-interlocked memory accesses are done whenever possible (this
>is not true of the '382, which apparently uses interlocked bus cycles rather
>superflously).

>memory mapped i/o

>an interesting part of '532 is its handling of memory mapped i/o (which, by
>the way, is the only type of i/o it has).  due to the extreme amount of

All of 32000 series are "memory mapped", so the '532 is no different in
this respect.

>cache'ing done on the chip, special steps must be taken to assure that
>i/o devices are handled correctly.  national calls the classifies the
>characteristics of i/o devices as "destructive-reading" and "side-effects
>of writing."

>destructive-reading means that reading from an i/o port can cause bits to
>be reset etc. in the device's registers.  due to the instruction pipeline
>design of the '532 it is possible for souce operands for one instruction to
>be read while or before the previous instruction is executing.  thus is it
>obviously a requirement that destructive-reading of source operands in
>in advance of executing the instruction be avoided.

Yes, indeed.  All of the newer generation processors have operand prefetch
capabilities, which is generally good, since it improves performance.
Unfortunately, you do have to override it for most I/O.

>side-effects of writing are a similar story.  "for example, before reading
>the counter's value from the 32202 interrupt controller unit it is first
>necessary to freeze the value by writing to another control register.
>... the '532 can read the source operands for the second instruction before
>writing the results of the previous instruction.  consequently, it is a
>requirement that read and write references to peripheral that exhibit
>side-effects of writing must occur in the order that instructions are
>executed."

Again, a side effect of operand prefetch.

>the '532 provides two methods of solving these problems.  the first method
>only solved the side-effects of writing problem, but requires no additional
>external hardware.  this is accomplished by placing the i/o devices in a
>special range of memory specifically designed to memory-mapped i/o devices
>(FF000000 - FFFFFFFF).
 
>the second method solves both of the above problems but requires some
>additional external decoding hardware.  "when the '532 generates a read bus
>cycle, it asserts the output signal IOINH' if either of the i/o requirements
>listed above are not satisfied.  when the reference is to a peripheral device
>that implements ports with destructive-reading or side-effects of writing,
>the input signal IODEC' must be asserted.  in addition, the device must
>not be selected if IOINH' is active."

My apologies that the data sheet makes this sound tougher than it is.
Basically all you need to do is feed IOINH into the address decode PAL
as an inhibit, and also feed it back into the IODEC input.  One simple
approach is to decode a range of addresses as `I/O space', and always
assert IODEC for addresses in this range.

The result of asserting IODEC is to cause the processor to cancel the
transaction unless it is the `real thing' so to speak.

(An excellent section on self modifying code, deleted for brevity, `cause
I have no comments on it.)

Some additional info:

	The internal cache hit rates are about 80% on both instructions
and data.  As a result of this, the performance of the part does not suffer
too much when wait states are added to external cycles.  Typically, we see
about 5% degradation per external wait state.  This makes the part ideal
for cost-sensitive designs where you can't afford an external cache.  The
performance advantage of the '532 over other architectures will be even
greater than with high-perf memory.

A programming `gotcha':  This also exists in the 332/382 set, and is not
secret or anything, but it is often overlooked.  Refer to page 2-11 of the
1986 NSC 32000 series h/w databook: Under figure 2.8, you will note an
explanation of the method used to encode displacements.  Certain types
of instructions cannot `cover' the full 4 gigabyte virtual address range.
This is not a problem if you are aware of it.  In the past (pre '382)
nobody paid any attention to our caveat 'cuz they only had 16 meg of address
space anyway.  Now you need to be aware of the limits.




>I never did receive any info on software for the 32000 series.  I am primarily
>interested in a good optimizing C compiler that I can port to the operating
>system I am writing.  thanks in advance.

Our CTP C compiler is pretty good, I'm told.  It produces about 20% faster code
on a 332, and more on the '532 because the '532 has been optimized to help
the compiler, and vice-versa.

The compiler is available.  Call your friendly neighborhood NSC salesperson.
Ask for someone who knows our 32000 product line, like one of our CPFAEs.
If you have any problems, let me know and I'll jump in.

GO DESIGN IT IN!  WE NEED THE BUCKS!!!

George Grenley