[comp.sys.m68k] MC68030 Burst Mode performances

rbt@cernvax.UUCP (rbt) (01/18/89)

=Reply by schene@prisma
=/* Written  3:59 pm  Jan 12, 1989 by jimmc@unet in comp.sys.m68k */
=>	Also, has anyone noticed that the instruction cache on the
=>	68030 can actually slow down execution speed? We saw this
=>	during evaluation and as yet have no explanation.
=
=Yes, this is a real phenomenon.  It is caused by the fact that the '030
=suspends execution until the entire cache line is fetched, rather than
=resuming when the first longword arrives (I think the cache line size
=on an '030 is 16 bytes).  On a no-wait-state machine, this resulted in
=an overall performance degradation of around 5%, as measured by a
=variety of benchmarks.  Consequently, that machine now ships with burst
=mode disabled.  We also had a one-wait-state machine, which showed a
=small performance gain with cache bursting, so it ships with bursts
=enabled.  The moral is that it depends on your implementation.
 
Ehm, this sounds a bit strange. The MC68030 burst mode is used to
optimise the "typical" data/instruction fetch stream made by compiled
code. There is a lot of literature on this topic that can be summarised
on one line: "code and data move in regions". There can be a performance
improvement if the processor pre-fetches instructions and data in a
stream rather then one by one, since it will likely be fetched anyway.
So, the MC68030 has a "special" mode where the processor gives the
address and reads the first longword (phase called "2" because two are
the steps executed in two clock cycles, one for the address and one for
the data read) followed by four cycles where the processor requests the
next longword (called "1" because this takes one clock cycle). The
typical burst sequence is then a 2-1-1-1 where four longwords are
fetched using only one addressing cycle, at full processor speed,
within 5 clock cycles.
 
We are now working on a new board with such a processor and fast RAMs.
The problems is that our RAM chips are fast for us but not for the
processor (we are using 100 ns static RAM memory). The burst mode gives
us a 2-2-2-2 cycle, where the processor must wait 1 cycle before
receiving the next longword. The tests we made on ASSEMBLER programs,
with data and instruction caches enabled, gave a degradation of the
system performances. I expect a FORTRAN or PASCAL program to run slower
with burst mode enabled. Only when the board will be upgraded to fast
(45 ns) static RAMS or fast Nibble-Mode dynamic memories we will
expect an improvement, with a real 2-1-1-1 sequence. Again, only on
compiled code. ASSEMBLER programs are usually peculiar: they do not
process big bunches of data and their stream is not very linear.
 
In conclusion, burst mode is useful when:
   o the RAM is fast enough to follow the processor 2-1-1-1 sequence;
   o data and code move in regions.
 
For the first point the answer is: buy RAM as fast as required by the
processor: this is expensive but can be done for the MC68030. The
problem will become more serious when the MC68040 will be available. I
heard Motorola saying (unofficially) that the speed improvements for
this processor cannot go very further since the external devices will
not be able to run at the required speed.
 
The second point is more a function of the program under execution.
Device drivers and operating system code will not improve their
performance enabling the burst mode. CPU-bound code, written in
high-level language, should run faster, but it this is mainly a function
of the linearity of the compiled code. Incidentally, the next Motorola
processor (the RISC 88000) requires a non-linear code, since its speed
can be degradated is one instruction needs the result coming from one of
the previous two or three steps. 88000 compilers are designed to "jump"
from one logical phase to another as much as possible, to use the
intrinsic parallelism of this machine. On the other hand, the 88000 has
two 16k bytes caches per processing node!
 
+-----------------------+----------------------------------------------+
|   Roberto Divia`      | Love at first sight is one of the greatest   |
|   =============       | labor-saving devices the world has ever seen |
+-----------------------+----------------------------------------------+

ech@pegasus.ATT.COM (Edward C Horvath) (01/20/89)

From article <906@cernvax.UUCP>, by rbt@cernvax.UUCP (rbt):
! We are now working on a new board with such a processor and fast RAMs.
! The problems is that our RAM chips are fast for us but not for the
! processor (we are using 100 ns static RAM memory). The burst mode gives
! us a 2-2-2-2 cycle, where the processor must wait 1 cycle before
! receiving the next longword. The tests we made on ASSEMBLER programs,
! with data and instruction caches enabled, gave a degradation of the
! system performances. I expect a FORTRAN or PASCAL program to run slower
! with burst mode enabled. Only when the board will be upgraded to fast
! (45 ns) static RAMS or fast Nibble-Mode dynamic memories we will
! expect an improvement, with a real 2-1-1-1 sequence. Again, only on
! compiled code. ASSEMBLER programs are usually peculiar: they do not
! process big bunches of data and their stream is not very linear.
!  
! In conclusion, burst mode is useful when:
!    o the RAM is fast enough to follow the processor 2-1-1-1 sequence;
!    o data and code move in regions.
!  
! For the first point the answer is: buy RAM as fast as required by the
! processor: this is expensive but can be done for the MC68030. The
! problem will become more serious when the MC68040 will be available. I
! heard Motorola saying (unofficially) that the speed improvements for
! this processor cannot go very further since the external devices will
! not be able to run at the required speed.

It seems to me that there's an alternative: build the memories WIDER
rather than FASTER.  Given that the processor is going to pull a longword
per cycle in straight-line code, store alternating longwords in two 2-cycle
RAMs which are operating out of phase.

Notice that we're always using "read ahead," so a branch to the bank that's
in mid-memory-cycle will cost one wait state while that bank completes its
(useless) fetch (...-1-3-1 v. optimal ...-1-2-1).  A branch to the bank
that just finished takes 2 cycles for the RAM as well as the processor.

Could be a damned sight cheaper than 45ns parts if you can hack the
increased chipcount.

=Ned Horvath=