[net.arch] net.digital: a High Performance Memory Address Register

ken@turtlevax.UUCP (Ken Turkowski) (05/30/84)

For a long time, I've wanted a file of auto-increment/auto-decrement
registers for use in systems with one memory used for multiple data
types.  The canonical type of operation is a matrix multiplication,
where there are two source operands and a destination operand.  These
are all vectors, so an auto-increment operation of some type is
performed after each memory access.  Normally, one has to maintain the
value of these pointers in CPU registers, and go through the cycle of
writing the MAR, fetching from memory, writing the MAR, fetching from
memory, performing the operation, writing the MAR, writing into memory,
etc.  If there were multiple MARs with the ability to increment by
arbitrary amounts after every access, as well as switch between the
MARs on the fly, performance would be increased by nearly 100%.

In the above scenario, three pointers are used for doing a canonical
dot product-type operation.  In the real world of computing devices,
the processor needs to do other things as well, so it is useful to have
a stack pointer, a heap pointer, a couple of FIFO pointers, and a local
(trashable) pointer.  So, a file with at least 4, and preferably 8
pointers would be very useful.

Some time ago, I did a pinout calculation for 4 deep by 16 wide MAR,
with bidirectional I/O on the bus side, output only on the memory
address driver side, increment/decrement/load/nothing mode, and came up
with 40 pins, including power.  Of course, this didn't have arbitrary
increments, nor could the outputs be stacked to have more than 4
pointers.  I doubt if anyone could use a pointer file with more than 16
bits, so there's no need to have them expandable in width, so you may
be able to get 8 pointers, 8 increments, a bidirectional and a
tristateable port into a 48-pin package.  With that kind of
functionality, I wouldn't mind the fat package.

Does anybody know of any chips suitable for implementing such an MAR
file?  I'm not aware of any chips that perform this function by itself,
but there may be a minimal combination of parts (4-8?) to implement it.

-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd70,decwrl,flairvax}!turtlevax!ken