[comp.arch] Info on HYPERSTONE comming here

kirchner@informatik.uni-kl.de (Reinhard Kirchner) (04/06/91)

Hello out there,

about two month ago I asked for your opinions on the processor 'Hyperstone'
created here in Germany by a small company under Otto Mueller, one of our
pioneers in data processing.

I got some response, most of which said: it is not known at all, please
post some information.

Now I got it: Since I did not like to type in their manual or even their
survey I phoned them and asked for something on a floppy. No here it is.

It is commercial, but very descriptive. I already asked if I could post
such a material here in this group, and did not get any response. So I
think everybody agreed and allows -:)

Reinhard Kirchner
Univ. Kaiserslautern, Germany
kirchner@uklirb.informatik.uni-kl.de

I do not get any payment or anything other for doing this, I simply do it
for getting feedback ( and perhaps because I am also a German, as Mr. Mueller )

----- Start ------ Start ------- Start of Hyperstone material -------


Experience and Know-How resulting from 33 yearsof electronics history:

 hyperstone

Only a person who is prepared to tread new paths and pursue new
ideas as an engineer can be an innovator. We now present the
result of our work: The hyperstone, a 32-bit microprocessor, which
sets a standard with regard to speed and economy. For embedded
control of communication systems or peripherals, for low-cost
boards for fast workstations and postscript systems, for
automation control, fast image-processing, graphics, data
collection and many other applications.

We, hyperstone electronics GmbH, are a young company founded in
1990. However, we have a long history. For more than 30 years Otto
and Ilse Mller, our founders and managing directors, have
questioned conventional paths and ideas time and time again. This
requires courage and is mostly anything but easy. But the quest
for the new, for innovation and progress is the only way to remain
at the forefront of market and technology. The hyperstone is the
latest result of our efforts.

Everything started in 1957 with the first ideas for the Telefunken
TR 10. Then the Nixdorf 820 and the Triumpf-Adler TA 100 came into
being. Whereas in Germany the dawning of the computer age was only
consciously registered by a small number of people, a team existed
which was to make its mark on an entire generation of computer and
office equipment: Otto and Ilse Mller.

They achieved a significant goal with their own company, CTM. CTM
70 to CTM 9032, the integrated office computer, LAN - much, which
did not become standard until today, had already been thought of
and put into practice by Otto Mller, the engineer, and Ilse
Mller, the entrepreneur, in the 1970s. On the basis of this
commercial and technological success the latest innovation, the
hyperstone, has been developed exclusively by themselves using
their own resources. Incidentally, the first 32-bit microprocessor
ever to be ready to go into production in Germany.

And now? The hyperstone E1 is on the market. A second source is
available. And we, hyperstone electronics GmbH, are working on the
further development so that you are always at the forefront of
market and technology.


       An alternative to CISC or RISC:

       The hyperstone E1 microprocessor

       Until now it was only possible to choose between the disadvantages
       of CISC (Complex Instruction Set Computer) or RISC (Reduced
       Instruction Set Computer) microprocessors. With CISC processors
       you profit from the years of experience with these processors and
       the efficient programming of time critical loops in assembly
       language is simple. And, very important: Low prices for components
       enable you to keep down the cost of developing your systems. Ideal
       except for one point: What happens when you require a higher
       processing speed?

       Processors with RISC architecture were designed specifically for
       this reason. But unfortunately at the expense of the other points:
       Nowadays, RISC processors can mostly only be programmed with
       compilers, meaning that some of the high speed is lost again
       immediately, even with so-called optimizing compilers. These RISC
       processors are relatively expensive; but even worse: You can no
       longer use low-priced components in your systems; you have to use
       expensive SRAMS instead of reasonably priced DRAMs. Moreover,
       RISCs require a large number of additional components to be able
       to work with the memory and the other parts of your system at all.
       High performance has its price.

       An alternative to CISC or RISC is now available:
       hyperstone E1: 25 MIPS maximum speed with standard DRAMs
       An innovative design combines the advantages of CISC - easy
       programmability, compact instruction code and low-priced
       additional components - with the speed of the RISC architectures.

       Powerful instruction set with single-cycle instructions
       The hyperstone fP has a powerful set of instructions of 16, 32 or
       48 bits in length. A program consists mainly of 16-bit
       instructions. Programs for the hyperstone fP require less than
       half of the memory space needed by most RISC processors and are
       even more compact than the programs of many CISC processors.
       As a result of the variable format you can specify 16- and 32-bit
       constants as well as all addresses as immediate operands;
       elaborate pre-instructions for the generation of longer constants
       and addresses are not necessary.

       The compact instruction code also reduces the bandwidth required
       for loading instructions from the memory; more of the total
       bandwidth is available for data transfer.

       Most instructions are executed in one cycle; the result is then
       already available in the next cycle.

       The powerful instruction set facilitates programming. It contains
       multiplication and division as well as double-word instructions.
       The management of stack frames during subprogram instructions is
       effected automatically. A variety of address modes are available
       for memory instructions.

     * Branch instructions without wait cycles

       The hyperstone fP has a pipeline of only two stages -
       decode/execute. Therefore, branches may be executed without wait
       cycles.
       
     * Overlapping memory access with hardware interlocks

       Load and store instructions are executed as single-cycle
       instructions. After the data has been transferred to the
       integrated memory controller, the processor executes the next
       instructions immediately. When data which has not been loaded is
       accessed, the processor is stopped; wait cycles are then inserted
       automatically.

       With this load/store architecture you can execute memory
       instructions without wait cycles. In the case of an optimization -
       simply execute load instructions as early as possible - you
       achieve better results than with huge data caches.

     * Stack Cache

       All modern computers use a stack for local data of subprograms.
       Local data is very intensively used. Therefore, the hyperstone fP
       has an integrated stack cache.

       A stack frame consists of a maximum of 16 local registers (local
       data per subprogram level). On average the last four stack frames
       are held in the stack cache.

       A subprogram call creates a new stack frame. A frame instruction
       in the subprogram determines the overlapping area for parameter
       passing and the size of the stack frame. The return instruction
       releases the current stack frame and restores the preceding stack
       frame. The data transfer between the stack cache and the stack in
       the memory is automatically controlled by the hyperstone fP and is
       completely transparent to the program.

     * Instruction cache with prefetch control

       The instructions executed last are kept in an instruction cache.
       As a result memory accesses in inner loops are avoided.
       Moreover, the innovative prefetch control already loads the next
       instructions from the memory into the instruction cache. Thus, the
       high hit-rate achieved with a very much larger cache can be
       achieved with a relatively small cache.

     * Integrated control logic

       The interface control logic for memory and peripherals is fully
       integrated on the hyperstone fP. This includes a DRAM controller
       with RAS-CAS-multiplexer, refresh logic, parity generation and
       test as well as a programmable bus controller for all RAM, ROM and
       peripherals. The number of bus cycles can be determined via a bus
       control register.

       You can build up a complete system with a minimum of additional
       components. A further advantage: you have a clear overview of your
       system and the logic design time is reduced.

     * PC-based software development

       Assemblers and C compilers, each with a source-level debugger, are
       available for software development on the PC under MS-DOS. You
       develop and test your program on the PC. The object-code is loaded
       into your hyperstone system or into an evaluation board via a RS-
       232 interface and executed there. Using the debugger you can
       display and alter data and programs from the hyperstone fP on
       screen anytime during software development. The finished program
       can then be programmed into a ROM or loaded via an interface.
       An evaluation board with the hyperstone fP, 1 MB DRAM, up to 256KB
       EPROM, an interrupt controller and a RS-232 interface (UART) is
       also available. With this board you can develop and test your
       program parallel to developing your hyperstone system.
       The software package, C compiler, assembler, debugger and the
       modules for communication via the RS-232 interface together
       provide a development environment comparable with an In-Circuit-
       Emulator (ICE). Your total investment for the development of a
       hyperstone system is therefore extremely low.
       
     * Further developments

       The next step is the integration of a timer, a multiply-
       /accumulate-unit and floating point. Areas of use are consequently
       opened in which a low-priced combination of digital signal
       processing (DSP) with a powerful microprocessor is called for.
       Further developments include a hyperstone fP of up to 120 MIPS
       (max. 80 MIPS integer or 40 MIPS integer + 80 MFLOPS).

    *  hyperstone architecture

       Registers:
       -   19 global and 64 local registers of 32 bits each
       -   Directly addressable are 16 global and up to 16 local
           registers

       Flags:
       -   Zero(Z), negative(N), carry(C) and overflow(V) flag
       -   Interrupt-lock, trace-mode, trace-pending, supervisor state,
           cache-mode and high global flag
    
       Register Data Types:
       -   Unsigned integer, signed integer, bitstring, IEEE-754
           floating-point, each either 32 or 64 bits

       Memory:
       -   Address space of 4 Gbytes
       -   Separate I/O address space
       -   Load/Store architecture
       -   Pipelined memory and I/O accesses
       -   High-order data located at lower address
       -   Virtual memory by demand paging via a page fault signal from
           external MMU
       -   Fault-causing memory instructions can easily be identified and
           repeated
       -   Instructions and double-word data may cross page boundaries

       Memory Data Types:
       -   Unsigned and signed byte (8 bit)
       -   Unsigned and signed halfword (16 bit), located on halfword
           boundary
       -   Undedicated word (32 bit), located on word boundary
       -   Undedicated double-word (64 bit), located on word boundary

       Runtime Stack:
       -   Runtime stack is subdivided into memory part and register part
       -   Register part is implemented by the 64 local registers holding
           the most recent stack frame(s)
       -   Current stack frame (maximum 16 registers) is always kept in
           register part of the stack
       -   Data transfer between memory and register part of the stack is
           automatic
       -   Upper stack bound is guarded

       Instruction Cache:
       -   An instruction cache of 128 bytes reduces instruction memory
           accesses substantially

       Exceptions:
       -   Pointer, Privilege, Frame and Range Error, Data and
           Instruction Page Fault, Interrupt and Trace mode exception
       -   Error- and fault-causing instructions can be identified by
           backtracking, allowing a very detailed error analysis

       Bus Interface:
       -   Separate address and data bus of 30 and 32 bits respectively
       -   Fast bus switching on DMA
       -   DRAM-controller with RAS-CAS-multiplexer, refresh logic,
           parity generation and test and a programmable bus-controller fully
           integrated

       Packaging:
       -   144-pin PPGA, QFP in preparation

       Instructions General:
       -   Variable-length instructions of 16, 32 or 48 bits halve
           required memory bandwidth
       -   Pipeline depth of only two stages, assures immediate refill
           after branches
       -   Register instructions of type "source operator destination ->
           destination" or "source operator immediate -> destination"
       -   All 32 or 64 bits participate in an operation
       -   Immediate operands of 5, 16 and 32 bits, zero- or sign-
           expanded
       -   Two sets of signed arithmetical instructions: instructions
           either set only the overflow flag on overflow or trap additionally
           to a Range Error routine

       Instruction Summary:
       -   Memory address modes: register address, register
           postincrement, register + displacement (including PC relative),
           register postincrement by displacement (next address), absolute,
           stack address, I/O absolute and I/O displacement
       -   Load, all data types, bytes and halfwords right adjusted and
           zero- or sign-expanded, execution proceeds after Load until data
           is needed
       -   Store, all data types, trap when unsigned or signed range of
           byte or halfword is exceeded
       -   Exchange word memory <-> register (for semaphores)
       -   Move, Move immediate, Move double-word
       -   Logical instructions AND, AND NOT, OR, XOR, NOT,
       -   Logical instructions AND NOT immediate, OR immediate, XOR
           immediate
       -   Mask source AND immediate -> destination
       -   Add unsigned/signed, Add signed with trap on overflow, Add
           with carry
       -   Add unsigned/signed immediate, Add signed immediate with trap
           on overflow
       -   Sum source + immediate -> destination, unsigned/signed and
           signed with trap on overflow
       -   Subtract unsigned/signed, Subtract signed with trap on
           overflow, Subtract with carry
       -   Negate unsigned/signed, Negate signed with trap on overflow
       -   Multiply word * word -> low-order word signed with trap on
           low-order word overflow, Multiply word * word -> double-word
           unsigned and signed
       -   Divide double-word by word -> quotient and remainder, unsigned
           and signed
       -   Shift left unsigned/signed, single and double-word, by
           constant and by content of register, Shift left signed by constant
           with trap on loss of high-order bits
       -   Shift right unsigned and signed, single and double-word, by
           constant and by content of register
       -   Rotate left single word by content of register
       -   Index Move: Check an index value for bounds and move it scaled
           by 1, 2, 4 or 8
       -   Check a value for an upper bound specified in a register or
           check for zero
       -   Compare unsigned/signed, Compare unsigned/signed immediate
       -   Compare bits, Compare bits immediate, Compare any byte zero
       -   Test number of leading zeros
       -   Set conditional 1 or -1
       -   Branch unconditional and conditional (12 conditions)
       -   Delayed Branch unconditional and conditional (12 conditions)
       -   Call subprogram, unconditional and on overflow flag
       -   Trap to supervisor subprogram, unconditional and conditional
           (11 conditions)
       -   Frame: Include parameters passed in frame addressing, set
           frame length, restore reserve length and check for upper stack
           bound
       -   Return from subprogram: Release current stack frame, restore
           preceding stack frame, program counter and status register
       -   Software instructions: Call an associated subprogram and pass
           the source operand and the address of a destination operand to it
       -   Floating-point instructions: Add, Subtract, Multiply, Divide,
           Compare and Compare unordered for single and double-precision, and
           Convert single <-> double, implemented as software instructions in
           the present version.

       
       How can we now convince you of everything our hyperstone can do?
       Get in touch with us if you too wish to make your products more
       efficient and profitable.
       
       hyperstone electronics GmbH
       Robert-Bosch-Str. 11
       
       D-7750 Konstanz
       West Germany

       tel.: 07531-67789
       fax : 07531-51725

( These phone numbers are from within Germany, from outside skip the zero
and preceed with 'germany', so mostly it should be

49-7531-......

----------- for the line eater ----------

linley@hpcuhe.cup.hp.com (Linley Gwennap) (04/12/91)

(Reinhard Kirchner) posts Hyperstone info:

>      hyperstone E1: 25 MIPS maximum speed with standard DRAMs

Thanks for posting this information.  It looks like a nice little
embedded controller.  Any performance specs other than the 25 MIPS
maximum?  I assume this is a 25 MHz processor that can execute one
instruction per cycle as long as the program fits in the 128-byte
instruction cache.  With standard DRAMs, the cache miss penalty must
be severe.  Do you have any price and availabilty numbers?

---------------------------------------------------------------------------
DISCLAIMER:  The views expressed here do not		Linley Gwennap
represent the views of the Hewlett-Packard		PA-RISC Marketing
Company.  Caveat emptor.				Hewlett-Packard