[comp.sys.m68k] Daystar Digital Review: 030 vs. 040 Perf.

jtr@oakhill.sps.mot.com (Jim Reinhart) (06/29/91)

Recently, Daystar Digital issued a paper entitled " The 68040 
on the Macintosh" justifying their decision to delay 
introduction of a 68040-based Macintosh accelerator.  In this 
paper Daystar utilized data reputedly supplied by Motorola to 
establish that the 68040 offered little performance advantage 
over the 68030 and, therefore, would not be of general 
interest on a performance basis for some time.  

Motorola soundly disagrees with this conclusion and denies 
the authenticity of the data attributed to Motorola.  While 
Daystar attempts to establish some reasonable conclusions, 
Motorola finds this paper to be generally misleading and 
meriting considerable further discussion.  Below, excerpts 
from the text of the Daystar paper will be found in 
capitalized type while Motorola's comments are in plain type.  
In addition, a commentary from Daystar competitor IIR is 
attached.

Prior to entering this discussion a number of simple concepts 
should be reviewed.  First, the delivered system performance 
of any computing machine is not represented by the simple 
performance of any one element of that machine but instead by 
the composite performance of the subsystems that comprise 
that machine.  These subsystems include (but are not 
necessarily limited to):  the central processor,  memory, 
graphics hardware, compiler, mass storage, network and I/O.  
Additionally, the perceived or observed performance of a 
computer depends heavily on the measurement criteria chosen 
for the observation.  Performance metrics can be chosen that 
focus on a single subsystem, a combination of subsystems or 
the composite performance of the entire machine.

Computer manufacturers routinely strive to balance the 
performance of the subsystems if a machine in order to 
maximize the total system performance subject to a particular 
cost consideration.  For example, in a very low-cost system, 
the manufacturer will pair a low-performance (relatively) CPU 
with a simple, low-performance memory subsystem.  If the 
chosen CPU only requires 5 megabytes per second of memory 
bandwidth to deliver its maximum performance, it makes little 
sense for the computer manufacturer to pair this CPU with a 
40 Mb/S memory system.  Conversely, if a CPU requires 40 Mb/S 
of memory bandwidth to deliver the desired level of 
performance, it makes little sense to pair this machine with 
a 5 Mb/S memory.  Similar analogies hold true for mass 
storage, network, graphics and so on.

When considering upgrading the performance of a machine by 
replacing or supplementing some of its components it is 
important to understand the potential rewards for doing so.  
If the machine has many activities that are dominated by CPU 
performance, it may be highly beneficial to upgrade the CPU 
performance.  Simple CPU accelerators perform this function 
well.  If typical machine performance is dominated by lack of 
memory bandwidth, a simple accelerator may provide little 
benefit while a more sophisticated model (e.g. with on-board 
memory), may provide desired results.  If machine performance 
is typically dominated by disk accesses, even a very 
sophisticated CPU accelerator may provide only marginal 
perceived results.  

Simply put, CPU accelerators provide the greatest benefit in 
machines where typical performance, as perceived by the user, 
is governed by processing power.  When measuring the benefit 
of a CPU accelerator, one should evaluate either total system 
performance or CPU related activities to determine the merit 
of the upgrade.

Daystar attempts to establish a justification for delaying 
the introduction of 68040-based accelerators based on two 
principle points: performance and compatibility.  Motorola 
contends the accuracy of these claims as discussed below.  
Daystar hints at but fails to make a very reasonable point 
concerning the continuing utility of 68030-based 
accelerators:  there are many Macintosh users who will be 
continued to be well-served by 68030 levels of performance 
and, due to other system implications, the performance 
offered by the 68040 may not be immediately required.

>1.0 OVERVIEW
>THE MOTOROLA 68040 PROCESSOR IS A MAJOR STEP FORWARD IN 
>PROCESSING POWER. WHEN COMPARED TO A 25 MHZ 68030/68882, A 25 
>MHZ 68040 OFFERS DOUBLE THE INTEGER PERFORMANCE AND THREE 
>TIMES THE SPEED IN FLOATING POINT CALCULATIONS, AS SHOWN IN 
>TABLE 1. BUT A 25 MHZ 68040 IS ONLY SLIGHTLY FASTER THAN A 40 
>MHZ 68030 (MAC IIFX).
>
>   TABLE 1: PERFORMANCE RELATIVE TO A 25 MHZ 68030/68882
>                      REF: MOTOROLA
>
>     TYPE     25 MHZ 68030     40 MHZ 68030    25 MHZ 68040
>     INTEGER       1.0              1.6             2.1
>     FPU           1.0              1.6             3.3

Extensive statistical and empirical studies conducted by 
Motorola have clearly established that the 25 MHz 68040 
performs integer operations at 3.2 times the speed of a 25 
MHz 68030 and floating point operations at roughly 5 times 
the 25 MHz 68030.  While some variations from these figures 
in some customer's systems can be expected due to differences 
in compiler technology (e.g. structure alignment ...), these 
studies have be independently verified in system benchmarks 
by Motorola customers.  There have been no inconsistencies in 
Motorola's position with respect to 68040 performance 
relative to the 68030.  See section 3.1 for specific data.

>INTEGER CALCULATIONS WHICH DRIVE MAC OPERATING SYSTEM (OS) 
>PERFORMANCE AND ALL APPLICATIONS SHOW GAINS OF 30%. THE REAL 
>STRENGTH OF THE 68040 LIES IN THE SPEED OF FLOATING POINT 
>CALCULATIONS, BUT THESE HAVE LITTLE OR NO BENEFIT FOR THE 
>TYPICAL GRAPHICS USER. ONLY APPLICATIONS IN THE SCIENTIFIC 
>AND CAD MARKET USE THE FLOATING POINT UNIT (FPU). IN SEVERAL 
>YEARS THE 68040 WILL BE RUNNING AT 40 MHZ. THIS PROCESSOR 
>WILL PROVIDE THE THE MUCH NEEDED POWER IN THE DTP, GRAPHICS, 
>PRE-PRESS AND SCIENTIFIC MARKETS.

The above conclusion stems from the invalid performance 
assumptions drawn previously.  The 25 MHz 68040 delivers 
almost exactly twice the integer performance of a 40 MHz 
68030.  Daystar's claims concerning Motorola product 
schedules are addressed below.

>SOFTWARE COMPATIBILITY WILL BE A MAJOR PROBLEM ON 68040 
>ACCELERATORS AS WELL AS APPLE'S NEW 68040 MACHINE.  APPLE 
>WILL HAVE TO MAKE MAJOR PATCHES TO THE MAC OS TO HANDLE 
>PROBLEMS WITH MEMORY MANAGEMENT AND EXCEPTION HANDLING. IN 
>ADDITION, THE MATH CODE WITHIN APPLICATIONS WILL HAVE TO BE 
>REWRITTEN TO DIRECTLY LEVERAGE THE  BENEFITS OF THE 68040'S 
>FPU.

There are indeed differences between the O.S. programming for 
the 68030 and 68040.  However, Daystar's assertion that this 
is a "MAJOR" problem is not generally supportable.  First, 
the differences have been documented long enough (~1.5 years 
in print) for vendors to make appropriate plans.  Second, the 
only major differences (or 'problems' in Daystar terminology) 
concern the virtual exception processing model and cache 
management.  This impacts only a small portion of O.S. code 
Finally, the assertion that applications will have to be 
rewritten due the 040 FPU is entirely misleading as will be 
discussed below.

>FOR THESE REASONS, DAYSTAR HAS DECIDED TO WAIT TO INTRODUCE 
>ITS 68040 ACCELERATOR UNTIL AFTER THE INTRODUCTION OF APPLE'S 
>68040 MACHINE. APPLE IS BEST SUITED MAKE THE NECESSARY OS 
>CHANGES AS WELL AS DRIVE CHANGES IN THIRD PARTY APPLICATIONS, 
>INITS AND CDEVS.

The above expresses Daystar's opinion.  Other accelerator 
manufacturers (e.g. Radius, Dove, IIR, Fusion Data) are not 
so inclined and are shipping 68040-based accelerators.  
Additionally, more than 50 different manufacturers of 
computer products are currently shipping successful 68040-
based machines.

>2.0 LESSONS FROM THE PAST
>EACH NEW GENERATION OF PROCESSOR FROM MOTOROLA HAS 
>INCORPORATED NEW FEATURES AND CAPABILITIES, MANY OF WHICH ARE 
>NOT COMPATIBLE WITH THE CURRENT GENERATION. THE MAC OS, BY 
>ITS VERY NATURE, DIRECTLY ADDRESSES THE HARDWARE. TO THE 
>EXTENT THAT THE HARDWARE CHANGES, THE OS MUST BE PATCHED. THE 
>GREATER THE CHANGE IN THE ARCHITECTURE OF THE PROCESSOR, THE 
>GREATER THE NUMBER AND SOPHISTICATION OF THE PATCHES.

Motorola agrees with this point with qualifications.  The 
68000 family has distinguished itself by maintaining complete 
upward compatibility for application software and confining 
all changes to be either proper supersets of existing 
functionality or visible only in the supervisor (O.S.) 
programming model.  While O.S. modifications may be required, 
O.S. code represents a small fraction of the entire code 
pool.  Further, portions of the O.S. that are effected by 
change, represent only a small portion of the total O.S. 
code.

>2.1 THE FIRST ACCELERATORS
>THE FIRST MAC PLUS AND SE ACCELERATORS UTILIZED A 68020 WITH 
>A 32-BIT BUS AS COMPARED THE 16-BIT BUS ON THE MAC SE'S 68000 
>PROCESSOR. THAT AND ITS FASTER CLOCK SPEED (16 MHZ VS. 8 MHZ) 
>CAUSED MANY AGGRAVATING INCOMPATIBILITIES WITH PARTS OF OF 
>MAC OS, VARIOUS APPLICATIONS, AND MANY INITS. IT WAS NOT 
>UNTIL THE MAC II WAS INTRODUCED WITH ITS OWN 16 MHZ 68020 DID 
>APPLE AND THE DEVELOPER COMMUNITY COMPLETELY SOLVE THE 
>PROBLEMS.

True, and every successive generation of the Mac O.S. has 
gone further the eliminate or reduce nasty things like timing 
dependencies.  In fact, the latest releases of the Mac O.S. 
function quite well across a range of CPU performance 
spanning more than an order of magnitude (68000 -> 68030).  
Additionally, refinements in the Mac O.S. have taken away 
some of the development community's motivation for 'bad' 
programming practices leading to platform dependencies.

>2.2 THE MAC II ACCELERATOR
>PROBLEMS STARTED OVER AGAIN WHEN APPLE INTRODUCED THE 16 MHZ 
>68030 MAC IIX. SURPRISINGLY, THE 68030 IS NEARLY IDENTICAL TO 
>THE 68020 EXCEPT  ADDITION OF THE 256 BYTE INTERNAL DATA 
>CACHE AND THE MEMORY MANAGEMENT UNIT (MMU). YET THERE WERE 
>NUMEROUS INCOMPATIBILITIES WITH VARIOUS PARTS OF THE MAC OS, 
>THIRD PARTY INITS  AND APPLICATIONS. MANY CDEVS AND INITS 
>ACCOMPLISH THEIR SPECIAL TASK BY MAKING CHANGES TO THE OS OR 
>DIRECTLY ADDRESSING THE HARDWARE (NECESSARY TO ACCOMPLISH A 
>SPECIAL TASK THAT APPLE DID NOT PROVIDE, NEVERTHELESS, A 
>VIOLATION OF APPLE GUIDELINES).
>
>APPLICATIONS THAT CLOSELY FOLLOWED DEVELOPER GUIDELINES 
>GENERALLY WORKED WELL ON THE 68030 CONVERSION. THESE WERE  
>SEVERAL KEY  APPLICATIONS THAT HAD PROBLEMS WORKING WITH THE 
>INTERNAL CACHE. AT THE SAME TIME DAYSTAR INTRODUCED  THE 33 
>MHZ 68030 ACCELERATORS. ONCE AGAIN, IT (AND OTHERS) EXPOSED 
>YET ANOTHER SET OF PROBLEMS WITH  APPLICATIONS, INITS AND 
>CDEVS THAT HAD CLOCK TIMING DEPENDENCIES AND PROBLEMS WORKING 
>WITH AN EXTERNAL MEMORY CACHE. EVEN APPLE'S FLOPPY DRIVER 
>CODE WOULD NOT RUN PROPERLY AT SPEEDS ABOVE 16 MHZ. DAYSTAR 
>(AND OTHERS) INVESTED SIGNIFICANT TIME PATCHING THE FLOPPY 
>DRIVER CODE. FOR THE 25 MHZ MAC IICI, APPLE HAD TO COMPLETELY 
>REWRITE THEIR FLOPPY DRIVER CODE TO ELIMINATE THESE TIMING 
>DEPENDENCIES.
>
>FROM A SOFTWARE STANDPOINT, CONVERSION FROM THE 68020 WORLD 
>TO THE 68030 WAS ABOUT AS EASY AS ONE COULD EVER ASK FOR. YET 
>IT WAS VERY FRUSTRATING FOR THE END-USER. WHILE MOST PROBLEMS 
>WERE ENCOUNTERED WITH INITS AND CDEVS, END-USERS WERE NOT 
>WILLING TO ELIMINATE THEM AS THEY HAD BECOME AN ESSENTIAL 
>PART OF THEIR "TOOL KIT". SOME EARLY BUYERS FOUND THE 
>EXPERIENCE VERY FRUSTRATING - THEY DID NOT HAVE THE TIME (OR 
>SKILLS) TO FIDDLE AROUND TRYING TO DEBUG THEIR MACHINE. THE 
>SAME EXPERIENCE WAS ONCE AGAIN REPEATED WHEN APPLE INTRODUCED 
>THEIR 32-BIT CLEAN ROMS ON THE MAC IICI AND MAC IIFX. WITH 
>OVER A YEAR OF WARNING FROM APPLE TO THE DEVELOPER COMMUNITY 
>THERE WERE STILL MANY APPLICATIONS, INITS AND CDEVS THAT HAD 
>SIGNIFICANT PROBLEMS, DRIVING THE USERS CRAZY.

Same general comments as above: it is a learning curve 
exercise.  Additionally, there is always a cost associated 
with being in a market leadership position.  This is true 
both for the 'early-adopting' manufacturer and the 
performance-driven user.  The accelerator community has 
differentiated itself by creating a careful balance of the 
cost of leading and the benefit derived.


>3.0 PERFORMANCE
>THE 68040 INCORPORATES SEVERAL INNOVATIVE DESIGN FEATURES 
>THAT BOOST PERFORMANCE OVER A 68030/68882 COMBINATION RUNNING 
>AT THE SAME CLOCK SPEED. GAINS ARE REALIZED IN BOTH INTEGER 
>AND FPU PERFORMANCE. INTEGER PERFORMANCE DRIVES  MAC OS AND 
>VIRTUALLY ALL APPLICATIONS. MAC OS, GRAPHICS, DTP AND PRE 
>PRESS APPLICATIONS MAKE LITTLE OR NO USE OF THE FPU, AS SHOWN 
>IN TABLE 2. FPU PERFORMANCE IS OF BENEFIT ONLY FOR A SUBSET 
>OF FUNCTIONS WITHIN CAD AND SCIENTIFIC APPLICATIONS. 
>SPREADSHEETS ONLY USE THE FPU FOR SPREADSHEET  
>RECALCULATIONS.    
>
>
>         TABLE 2: BENEFIT OF 50 MHZ FPU ON IICI ACCELERATOR
>                          REF: DAYSTAR
>Platform              MacIIci  AccelIIci AccelIIci  AccelIIci
>Possessor              68030     68030     68030      FPU
>Clock                  25 MHz    50 MHz    50 MHz   % Gain
>FPU                     Yes        NO        Yes      Yes
>Word      Scroll        8.9       6.5        6.5       0%
>Renderman Render       98.0      82.0       56.0      46%
>Excel     Cut&Paste     9.1       5.5        5.5       0%
>Excel     Scroll       10.3      10.0       10.0       0%
>Excel     Recalc       10.4       6.6        5.6      17%
>Xpress    Fit in wndw   5.4       3.4        3.4       0%
>Xpress    Scroll       24.2       6.9       16.8       0%
>FreeHand  Fit in wndw  21.8      11.9       11.9       0%
>Freehand  Duplicate    34.5      18.7       18.7       0%
>FileMaker Sort         56.3      42.2       42.2       0%
>Swivel 3D Change View  17.4       8.1        8.1       0%
>Swivel 3D Tween        73.1      32.6       32.6       0%
>ClarisCad Fit in wndw   6.5       4.6        4.2       8%
>PhotoShop Rotate        4.8       3.6        3.6       0%
>PhotoShop Resample     36.6      19.7       19.7       0%
>PhotoShop Gausian Blur 19.3      12.0       12.0       0%
>Total Time (sec)      436.53   284.18      256.79     11%
>
>
>SHOWN IN TABLE 2 IS A MAC IICI ACCELERATOR WITH AND WITHOUT A 
>50 MHZ 68882 FPU. DOUBLING THE SPEED OF THE FPU HAS NO 
>BENEFIT IN MANY APPLICATIONS, EVEN WITHIN CAD APPLICATIONS. 
>BASED ON THE EVIDENCE IN FIGURE 2, DAYSTAR RECOMMENDS THAT 
>ITS GRAPHICS AND DTP CUSTOMERS NOT BUY AN OPTIONAL 68882 FPU 
>ON ITS ACCELERATORS.

Certainly not all applications utilize floating point math.  
However, those that do benefit substantially from floating 
point hardware.  Note that Daystar carefully chooses a large 
sampling of application subsets that do not utilize floating 
point to prove their point.  Even with a possibly biased 
sampling of 16 application subsets, only three of which 
utilize floating, Daystar measures an 11% overall improvement 
in benchmark performance.

>3.1 INTEGER PERFORMANCE
>FOR INTEGER PERFORMANCE, THE 68040 HAS A HIGH DEGREE OF 
>INSTRUCTION PARALLELISM IT IS CAPABLE OF EXECUTING IN ONE 
>CLOCK CYCLE AN INSTRUCTION THAT MAY TAKE 3-4 CYCLES TO 
>EXECUTE ON A 68030. THE 68040 HAS TWO 4,096 BYTE CACHES FOR 
>BOTH INSTRUCTION AND DATA, AND BOTH ARE FOUR-WAY SET 
>ASSOCIATIVE. CONTRAST THIS TO A 68030, WHICH HAS ONLY A 256 
>BYTES DIRECT MAPPED CACHE (LESS EFFICIENT). THEREFORE, THE 
>68040 WILL EXHIBIT A MUCH HIGHER "HIT" RATE ALLOWING ZERO 
>WAIT STATE PERFORMANCE UP TO 40 MHZ. IN FACT, THE 68040 
>CACHES ARE SO EFFICIENT THAT THERE WILL BE NO NEED TO ADD 
>EXTERNAL CACHE, AS IS REQUIRED IN THE FASTER 68030'S.
>
>PREDICTED INTEGER PERFORMANCE FOR THE 68040 (BASED ON 
>MOTOROLA DATA) IS SHOWN IN THE TABLE 3 AGAINST A ZERO WAIT 
>STATE 68030.  PERCENTAGE GAINS ARE SHOWN AGAINST A 40 MHZ
>68030 (TO REPRESENT A MAC IIFX).  EXPECTED GAINS FOR THE 25
>MHZ 68040 ARE ONLY ON THE ORDER OF 30% (1.3) WHEN COMPARED TO 
>THE 40 MHZ 68030.
>
>     TABLE 3: PERFORMANCE RELATIVE TO A 40 MHZ 68030
>                   REF: MOTOROLA
>      CLOCK    68030   68040  68040 VOLUME SHIP
>     16 MHZ     0.4     N/A        N/A
>     25 MHZ     0.6     1.3        Q2 91
>     33 MHZ     0.8     1.7        Q1 92 
>     40 MHZ     1.0     2.1        LATE 92
>     50 MHZ     1.3     N/A        N/A
>
>
>GAINS OF 30% WILL NOT SATISFY POWER USERS. THEY REALLY DEMAND 
>GAINS OF 100-200%, AND THESE WILL NOT BE AVAILABLE FOR 
>SEVERAL YEARS, AT LEAST FOR THE MAC IIFX. GAINS ON THE 16 MHZ 
>MAC IIS SHOULD BE A LITTLE OVER THREE TIMES GREATER WHEN THE 
>40 MHZ 68040 IS INTRODUCED IN LATE 1992, SO AN APPRECIABLE 
>UPGRADE MARKET WILL EXIST FOR USERS WHO WANT BETTER THAN IIFX 
>CLASS PERFORMANCE. BUT IN THE MEANTIME, WILL THE INITIAL 
>68040 COMPATIBILITY PROBLEMS BE MORE OF A PROBLEM THAN A IIFX 
>UPGRADE OR A 50 MHZ ACCELERATOR?

Motorola denies authenticity of the above claims presented by 
Daystar based on three issues:  first, the information is 
factually incorrect; second, the source of this information 
is not Motorola; third, if the source was Motorola, Daystar 
would be in serious breach of legal non-disclosure agreements 
concerning Motorola's future product plans.  The legal 
agreement contained in Motorola file #89111652RD prohibits 
Daystar Digital Inc. from disclosing the proprietary 
information of Motorola Inc.

Motorola has not formally introduced products beyond the 
current 25 MHz 68040 but has publicly stated that 33 MHz 
volume shipments will begin in 3Q91 with 40 MHz shipments 
beginning late in the year.  Motorola reiterates early 
comments concerning 68040 performance relative to the 68030:  
the 68040 is 3.2 times faster on integer code and ~5 times 
faster on floating-point intensive code at the same clock 
frequency.  

Some simple facts illustrating timing differences between the 
68030 and 68040 (cache hits assumed):

                      #Clks      #Clks
     Instruction:     68030      68040     Ratio
     Arith/LOG R->R     2          1         2
     Arith M->R         5          1         5
     MOVE M->R          5          1         5
     MOVE R->M          3          1         3
     FADD R->R         39          3        13
     FMUL R->R         59          5        12

These are only rough examples, some benchmarks may be of use 
as well (source of all data is Workstation Laboratories):

                      50 MHz     25 MHz
     Benchmark        68030      68040     Ratio
     Dhry 1.1         21008      45454      2.2
     Dhry 2.1         17493      38760      2.2
     iSPEC              6.5       12.9      2.0
     Linpack(DP Ftn)   .425       1.69      4.0
     Linpack (coded)   .560        2.9      5.1
     
The source of the above data is Workstation Labs (an 
independent performance testing organization) - this data 
does not support Daystar claims.  Note closely that the 68030 
data is for 50 MHz operation (w/ 32k 0ws cache).  Based on 
this data, a 25 MHz 68040 would be about 2.6 times faster 
than a 40 MHz 68030.

>3.2 FPU PERFORMANCE
>THE REAL POWER OF THE 68040 LIES WITHIN ITS FPU PERFORMANCE. 
>BY COMBINING THE CPU AND FPU INTO THE SAME PIECE OF SILICON, 
>FPU HAS BEEN BOOSTED THREE TIMES. BUT TO ACHIEVE THIS 
>INTEGRATION MOTOROLA ACCEPTED A MAJOR SACRIFICE IN 
>INSTRUCTION SET COMMONALITY. APPLICATIONS NOT WRITTEN TO 
>DIRECTLY ADDRESS THE 68040 FPU WILL EITHER HAVE TO BE 
>REWRITTEN, OR WILL HAVE TO OPERATE THROUGH ABOUT 256K OF CODE 
>THAT TRANSLATES THE 68882 CALLS INTO 68040 CALLS. THE 
>OVERHEAD REQUIRED FOR THIS TRANSLATION PROCESS WILL 
>DRASTICALLY REDUCE 68040 FPU PERFORMANCE  GAINS.

The real power of the 68040 lies in its integration, 
compatibility and sustained performance.  Floating point is 
indeed a part of the performance picture but only a part.  
The ability of the 68040 to deliver excellent performance in 
very low cost memory systems has been very key to its 
success.

Daystar's description of the 68040 floating point unit is 
somewhat inaccurate.  The 68040 provides hardware support for 
a subset of the 68882 instruction set optimized to deliver 
superior performance on the most commonly used set of 
floating point instructions.  Based on customer and market 
requirements, the majority of the 68040 silicon budget for 
floating point was dedicated to a set of critically important 
operations.  A Motorola-supplied software package (the 
executable is ~40k, NOT 256k reported by Daystar) provides 
full object code compatibility with any 68881/68882 programs.  

When the 040 encounters a floating point operation it decides 
whether or not that particular instruction is one of the 
instructions supported in hardware (FMOVE, FCMP, FABS, FTST, 
FNEG, FADD, FMUL, FDIV, FSUB, FDBcc, FBcc, FSQRT, FSAVE, 
FRESTORE) and if so, the 040 executes that instruction.  
Otherwise (e.g. for transcendentals like FSIN) the 68040 
automatically calls the floating point software package to 
perform this function - it is entirely invisible to the user.  

>IF AN FPU INTENSIVE FUNCTION IS REWRITTEN TO DIRECTLY USE THE 
>68040 FPU INSTRUCTION SET, THEN PERFORMANCE GAINS CAN BE 
>SUBSTANTIAL. TABLE 4 CONTAINS ESTIMATES FOR THE IMPACT OF THE 
>68040 FPU ON THE TWO FPU INTENSIVE FUNCTIONS SHOWN IN TABLE 
>2.

It is also possible, with recompilation, to have an 
application directly call the floating point software package 
to avoid the overhead of the automatic call performed by the 
040.  This does have nice performance advantages but is not 
in any manner necessary for compatibility.

>TABLE 4:ESTIMATED POSSIBLE 25 MHZ 68040 FPU PERFORMANCE GAIN
>                        REF. DAYSTAR
>  Platform          MacIIci   AccelIIci AccelIIci AccelIIci
>  Processor          68030      68030      68040     FPU
>  Clock              25 MHz     50 MHz     25 MHz   %Gain
>  FPU                 Yes        Yes        Yes
>  RenderMan  Render   98.0      56.0       17.8    215%
>  Excel      Recalc   10.4       5 6        3.5     60%
>  
>
>IN SUMMARY, THE MAC COMMUNITY WILL NOT SEE IMMEDIATE GAINS IN 
>68040 PERFORMANCE. A 25 MHZ 68040 IS NOT THAT MUCH FASTER 
>THAN A MAC IIFX, FOR INTEGER PERFORMANCE. AND, 68040 FPU 
>PERFORMANCE WILL BE OF LITTLE BENEFIT TO THE TYPICAL MAC 
>USER. HOWEVER, IN SEVERAL YEARS THE 40 MHZ 68040 WILL BE 
>DOUBLE THE SPEED OF THE MAC IIFX, AND OFFER EVEN GREATER 
>GAINS FOR CAD AND SCIENTIFIC FUNCTIONS DIRECTLY UTILIZING THE 
>68040'S FPU.

This represents Daystar's opinion based on questionable and, 
in Motorola's opinion, inaccurate performance claims.

>3.3 TODAY'S ACCELERATOR PERFORMANCE
>THE LIMITED GAINS OF THE 25 MHZ 68040 ARE VERIFIED BY 
>BENCHMARKS RUN AT THE JANUARY, 1991 SAN FRANCISCO MACWORLD. 
>HERE, PROTOTYPE ACCELERATORS WERE BEING SHOWN BY TWO 
>DIFFERENT COMPANIES. IN TABLE 5, BENCHMARK PERFORMANCE IS 
>SHOWN AGAINST CURRENT STATE-OF-THE-ART MACHINES.
>
>THESE TEST SHOW THAT GAINS IN INTEGER PERFORMANCE ARE BELOW 
>MOTOROLA ESTIMATES. FPU PERFORMANCE IS NO BETTER THAN A 
>REGULAR MAC. THESE PROTOTYPES WERE OPERATING IN A VERY 
>RESTRICTED ENVIRONMENT (THEY WERE ONLY RUNNING BENCHMARKS). 
>APPLICATIONS WERE NOT BEING SHOWN. IN CONTRAST, ONCE THE 
>68030 WAS STABLE, UP AND RUNNING, THERE WERE FEW MAC OS OR 
>APPLICATIONS PROBLEMS TO OVERCOME. IN ALL FAIRNESS, THESE 
>WERE JUST EARLY ENGINEERING PROTOTYPES,AND THEY HAD NOT YET 
>"TWEEKED" PERFORMANCE TO THE MAXIMUM, AS IS COMMON IN THE 
>DEVELOPMENT PROCESS.
>
>    TABLE 5: 25 MHZ PROTOTYPE ACCELERATOR PERFORMANCE
>                 REF: DAYSTAR MEASUREMENT
>OEM             APPLE   APPLE  DAYSTAR TOKAMAC  IIR
>PLATFORM       MACIICI MACIIFX MACIICI MAC LC MACII/IIX
>CPU             68030   68030   68030  68040   68040
>FPU              YES     YES     YES    YES     YES
>SPEED           25 MHZ  40 MHZ 50 MHZ 25 MHZ   25 MHZ
>FLOAT     INT    0.18    0.15   0.10   0.20     0.10
>TRIG      FPU    0.57    0.36   0.32   3.18     1.20
>BUTTERFLY FPU    2.33    2.17   1.57   4.18     2.40
>RIPPLES   FPU   17.10   12.87   9.83  30.53     7.80
>SIEVE     INT    0.27    0.18   0.15   0.22     0.16
>MOIRE     INT    8.77    9.40   6.58   7.50     5.20
>TOTAL(SEC)      29.22   25.13  17.55  45.81    16.86

The above data provides little useful data since it provides 
nothing with respect to references points.  Comparing a 68040 
operating in one machine versus a 68030 operating in an 
entirely different machine is an apple-to-oranges comparison.  
If one really wants to generate a professional and conclusive 
comparison, the 040 and 030 should be compared against each 
other in the same environment.  For example, the Tokamac in 
the MAC LC is restricted to running on a 16-bit data bus.  

What are the performance figures for the LC without an 
accelerator and with an 030-based accelerator running on a 
16-bit bus?  No data.  What was the configuration of the 
68040 in these evaluations?  No data.  What are the system 
configuration differences between the IIci, the II, the LC 
and the FX?  No data.  What is the native versus accelerated 
performance for each of the machines?  No data.  Is a 40 MHz 
Mac IIfx really only 16% faster than a 25 MHz Mac IIci?  Well 
maybe if you pick the right benchmarks to prove your point.  
What is the point?

>4.0. COMPATIBILITY
>THE MAJOR PROBLEMS COME IN THE AREA OF SOFTWARE INTEGRATION 
>(BOTH MAC OS AND APPLICATIONS). THERE ARE THREE AREAS OF 
>COMPATIBILITY PROBLEMS: (1) MEMORY MANAGEMENT, (2) EXCEPTION 
>HANDLING AND (3) FLOATING POINT.
>
>4.1 MEMORY MANAGEMENT
>SINCE THE INTRODUCTION OF THE MAC IIX, USE OF THE MEMORY 
>MANAGEMENT UNIT (MMU) IN THE 68030 HAS BECOME A FUNDAMENTAL  
>PART OF MAC SYSTEM SOFTWARE. IT IS USED TO GRANT ACCESS TO 
>MEMORY, FLIP BETWEEN 24 AND 32 BIT MODE, AND PROVIDE VIRTUAL 
>MEMORY UNDER SYSTEM 7.0 AND A/UX.
>
>BOTH THE 68030 AND 68040 HAVE ON-CHIP MMUS, BUT THEY ARE VERY 
>DIFFERENT IN FEATURE SET, REGISTER FORMAT, AND PAGE TABLE 
>FORMATS.  IT IS SAFE TO SAY THAT ALL ROM CODE AND MAC SYSTEM
>SOFTWARE WHICH DEALS WITH THE MMU MUST BE MODIFIED TO RUN ON 
>THE 68040. THE MAJORITY OF THIRD PARTY SOFTWARE SHOULD NOT 
>NEED MODIFICATION (EXCEPT PROCESSOR SPECIFIC PRODUCTS SUCH AS 
>VIRTUAL) OR PRODUCTS THAT ADDRESS MMU HARDWARE DIRECTLY 
>(THOSE WHICH VIOLATE APPLE'S GUIDELINES).   

True.  Previous comments apply here as well. 

>4.2 EXCEPTION HANDLING
>AN EXCEPTION IS DEFINED AS A CONDITION THAT THE PROCESSOR 
>DOES NOT KNOW HOW TO HANDLE. FOR EXAMPLE, DIVIDING BY ZERO, 
>ACCESSING NON-EXISTENT MEMORY, AND UNKNOWN PROCESSOR 
>INSTRUCTIONS ALL GENERATE EXCEPTIONS.
>
>THE PROCESSOR SAVES INFORMATION ABOUT THE OPERATION ON THE 
>STACK AND CALLS THE EXCEPTION HANDLER. IN SOME CASES, THE 
>68040 WILL PUT DIFFERENT INFORMATION  ON THE STACK THAN THE 
>68030, CAUSING AN ERROR WITH THE EXCEPTION HANDLER. 
>APPLICATIONS, INITS AND CDEVS COMMONLY USE THE BUS ERROR 
>MECHANISM TO CHECK FOR THE EXISTENCE OF MEMORY. ALL OF THESE 
>MUST BE RECODED TO RUN THE 68040.  MAC DEBUGGERS MUST ALSO BE 
>MODIFIED FOR CORRECT OPERATION ON THE 68040.
>
>MAC SYSTEM AND ROM SOFTWARE WHICH HANDLES EXCEPTIONS MUST BE 
>MODIFIED, AS WELL AS THE A/UX KERNEL. SOME THIRD PARTY 
>SOFTWARE WILL ALSO NEED TO BE CHANGED.

Yellow journalism.  The only exception handling change 
implemented in the 68040 is the virtual exception mechanism 
discussed in 4.1 (by nature, accessing 'non-existent' memory 
is a virtual exception).  Daystar's discussion of zero-divide 
and "unknown" instructions seems intended to seed fear and 
uncertainty.

>4.3 FLOATING POINT UNIT
>THE 68882 (THE FPU THAT IS USED WITH 68030 BASED SYSTEMS) 
>UNDERSTANDS 5O DIFFERENT OPERATIONS. THE 68040 UNDERSTANDS 
>ONLY 20. UNLIKE THE 68882'S FPU, THE 68040'S INTERNAL FPU 
>DOES NOT PROVIDE TRIGONOMETRIC OPERATIONS SUCH AS SIN, COS,
>AND TAN. FOR APPLICATIONS TO WORK CORRECTLY, AN EMULATOR MUST 
>BE PROVIDED THAT RUNS WHENEVER AN UNRECOGNIZED FLOATING POINT
>OPERATION IS ENCOUNTERED THIS SOFTWARE MUST DECODE THE 
>REQUESTED OPERATION, DO THE OPERATION IN SOFTWARE, AND
>RETURN THE RESULTS TO THE PROGRAM.
>
>THIS PROCESSING CAN BE DONE TRANSPARENTLY TO THE USED UNDER 
>SYSTEM 6 AND SYSTEM 7. FOR A/UX COMPATIBILITY, THE KERNEL 
>WILL HAVE TO BE MODIFIED. MOTOROLA PROVIDES ABOUT 256K OF 
>TRANSLATION CODE TO BE CALLED BY THE MODIFIED OS, BUT IT ADDS 
>ADDITIONAL OVERHEAD TO THE PROCESSOR TO TRANSLATE THE CODE. 
>THIS TENDS TO OFFSET THE PERFORMANCE  BENEFITS.

This issue was discussed previously.  There is both some 
accuracy and some exaggeration to the Daystar description. 

>5.0 CONCLUSION
>CONVERSION FROM THE WORLD OF THE 68030 TO THE 68040 IS THE 
>TOUGHEST ONE FACED YET. EARLY ACCELERATOR BOARD USERS WILL 
>FACE AN UNPREDICTABLE ENVIRONMENT WHERE SOME INITS, CDEVS, 
>AND APPLICATIONS DO NOT WORK.  MOST OF ALL THEY WILL FACE AN
>OPERATING SYSTEM WITH MAJOR INCOMPATIBILITIES IN A FEW KEY 
>AREAS (ESPECIALLY SYSTEM 7 VIRTUAL MEMORY). IN ADDITION 
>PERFORMANCE GAINS FOR THE 25 MHZ AND 33 MHZ VERSION WILL NOT 
>BE MUCH GREATER IIFX LEVELS. FOR THESE REASONS, DAYSTAR 
>DECIDED IN EARLY 1990 TO WAIT UNTIL APPLE INTRODUCED THEIR 
>OWN 68040. APPLE CAN BEST MAKE THE CHANGES NECESSARY FOR 
>68040 COMPATIBILITY.
>
>DAYSTAR HAS LEARNED THAT END-USERS JUDGE ACCELERATOR QUALITY 
>FIRST BY COMPATIBILITY AND THEN BY SPEED. UNTIL A PRODUCT IS 
>RELIABLE, FOR WHATEVER REASON, IT SHOULD NOT BE SHIPPED. 
>WAITING FOR APPLE'S 68040 TO BE RELEASED WILL FORCE THE 
>DEVELOPER COMMUNITY TO SOLVE ITS COMPATIBILITY PROBLEMS, WITH 
>A SPEED FAR GREATER THAN THAT PROVIDED BY ANY THIRD PARTY 
>DEVELOPER. ADDITIONALLY,  APPLE WILL HAVE SOLVED ITS OWN 
>INCOMPATIBILITIES WITH THE OPERATING SYSTEM.
>
>SEVERAL MONTHS AFTER APPLE'S 68040 INTRODUCTION DAYSTAR PLANS 
>TO INTRODUCE A 68040 ACCELERATOR. IT WILL BE AN ACCELERATOR 
>THAT BUILDS ON APPLE'S APPROACH TO 68040 INTEGRATION. THIS 
>WAS AN ESPECIALLY DIFFICULT DECISION SINCE DAYSTAR HAD ALWAYS 
>BEEN THE FIRST TO BRING FASTER SPEED TO THE MACINTOSH II
>FAMILY. IN THIS CASE, IT IS BEST TO LET APPLE GO FIRST. 
>DAYSTAR DOES NOT WANT TO PLACE ITS USERS ON THE ''BLEEDING-
>EDGE'' OF TECHNOLOGY WITH LITTLE OR NO PERFORMANCE BENEFIT.

Motorola truly wishes Daystar success with their decision.  
We do believe that 68030-based accelerator products can 
provide attractive cost/performance solutions for lots of 
Macintoshes.  We also believe that 68040-based accelerators 
offer something special: the ability to scream through the 
toughest compute problem a Mac user will face.

Read on for another opinion.  Your mileage may vary so think 
about a test drive if you have doubts! 



THE FOLLOWING IS FROM IIR (A MAKER OF MACINTOSH ACCELERATION 
PRODUCTS):
 
>You Know Where To Put This 
>
>Reading DayStar's White Paper regarding 68040 is reminiscent 
>of how the horse trading industry responded to the automobile 
>in the 1800's. The horse traders tried to convince their 
>market that the automobile would never be practical because
>the horse could navigate the roads that existed at that time, 
>and the automobile would be hopelessly lost without smooth 
>roads to operate on. After all, a horse and buggy could go 
>anywhere. On the other hand, there is some nice film footage 
>on cable showing automobile 'pilots' pushing automobiles out 
>of the mud in these early days. there are even pictures that 
>show the drivers disassembling their horseless carriages, and 
>rebuilding them back on the road.
>
>DayStar discussing  68040 issues much like a deaf person 
>discussing Mozart  DSD's benchmark results are due to half a 
>minute's worth of time to run a single benchmark one time.
>From those results, DSD felt strongly enough to generate 
>multiple pages of white paper explaining why they weren't 
>going to do an 040 product.
>
>Referring to the benchmark results in DSD's "paper", it is 
>true that they only saw demo software at MacWorld in January. 
>Why would we show anything substantial to the competition?! 
>Many bonafide customer's, as well as industry magazine 
>editors saw Macintosh applications such as PhotoShop and 
>MacDraw executing quite nicely, thank you very much.
>
>DSD maintains that floating point software doesn't gain any 
>performance because many FPU calls are emulated by software. 
>This misconception demonstrates their lack of understanding 
>of the 040 product. Motorola licenses the FPU emulation 
>software to real developers (even to DSD, if they had a use 
>for it). The results are truly awesome. Except for some mini- 
>and mainframe- computers, and some 64-bit RISC parts, the 
>68040 is the fastest CPU chip today. Thinking otherwise is 
>drinking your own bath water.
>
>It is true, however, that there are some incompatibilities 
>with current software. But, those incompatibilities are only 
>within programs that are poorly written, or crudely kluged by 
>weak programmers (you know who you are).
>
>Finally, we are quite aware of the performance and quality of 
>the DSD product line.  We designed the Accelerator II line. 
>Does anybody really believe that we would spend months 
>producing an inferior product?
>
>The future waits for no one- technology will continue to move 
>forward. Let the natural evolution process of technological 
>survival of the fittest decide the future of Macintosh, and 
>its developers. Just as the automobile overcame adversity to 
>replace the horse and buggy, so shall the 040 function in the 
>Macintosh. Let the companies and people that can't keep up 
>fall by the technological wayside.
>
>As Ted Turner's desk plaque (purportedly) says:   Lead  
>Follow, or Get Out Of The Way.
>
>The Design Team from IIR, formerly with DSD
-- 
Regards, 
Jim Reinhart
Motorola Microprocessor & Memory Technologies Group
Austin, Texas

paul@taniwha.UUCP (Paul Campbell) (07/01/91)

In article <1991Jun28.230007.10651@oakhill.sps.mot.com> jtr@oakhill.sps.mot.com (Jim Reinhart) writes:
>
>>SOFTWARE COMPATIBILITY WILL BE A MAJOR PROBLEM ON 68040 
>>ACCELERATORS AS WELL AS APPLE'S NEW 68040 MACHINE.  APPLE 
>>WILL HAVE TO MAKE MAJOR PATCHES TO THE MAC OS TO HANDLE 
>>PROBLEMS WITH MEMORY MANAGEMENT AND EXCEPTION HANDLING. IN 
>
>There are indeed differences between the O.S. programming for 
>the 68030 and 68040.  However, Daystar's assertion that this 
>is a "MAJOR" problem is not generally supportable.  First, 
>the differences have been documented long enough (~1.5 years 
>in print) for vendors to make appropriate plans.  Second, the 
>only major differences (or 'problems' in Daystar terminology) 
>concern the virtual exception processing model and cache 
>management.  This impacts only a small portion of O.S. code 

While I mostly agree with Mot. and disagree with Daystar on almost
all these issues, I have to take exception to this response. There is a
quite large body of code (mostly INITs and things that do trap
patches) that runs on existing Macs and doesn't run on '040s,
the main reason has to do with programmers who modify code, or load
data containing code and execute it without being aware of the
impacts of the caches in the '020/'030 and '040. The reason this happens
is for 2 reasons:

	1)	People who are running '040s without writethrough data
		caches (ie a copyback cache) which means that data
		may be in the cache but that memory has an old value

	2)	The '040 caches are much bigger, code that happens to run
		today on an '030 fails on an '040 because the '030 
		instruction cache entries happened to be replaced because
		it's much smaller.

So if you are a Mac programmer: if you load code in a resource, or copy it
anywhere, or if you modify code (a good example is the trick of putting
the address of a trap address after a jmp instruction when you are trap
patching) remember to flush the caches, it's easy, you have to do both
of them and it's done using the traps:

	_FlushInstructionCache
	_FlushDataCache

If in doubt flush the caches! On the other hand CPUs like the '040 depend
on caches for their performance - so don't flush the caches unnecessarily.

	Paul Campbell

-- 
Paul Campbell    UUCP: ..!mtxinu!taniwha!paul     AppleLink: CAMPBELL.P

Tom Metzger's White Ayrian Resistance has been enjoined to stop selling Nazi
Bart Simpson t-shirts - Tom of course got it wrong, Bart is yellow, not white.