jtr@oakhill.sps.mot.com (Jim Reinhart) (06/29/91)
Recently, Daystar Digital issued a paper entitled " The 68040 on the Macintosh" justifying their decision to delay introduction of a 68040-based Macintosh accelerator. In this paper Daystar utilized data reputedly supplied by Motorola to establish that the 68040 offered little performance advantage over the 68030 and, therefore, would not be of general interest on a performance basis for some time. Motorola soundly disagrees with this conclusion and denies the authenticity of the data attributed to Motorola. While Daystar attempts to establish some reasonable conclusions, Motorola finds this paper to be generally misleading and meriting considerable further discussion. Below, excerpts from the text of the Daystar paper will be found in capitalized type while Motorola's comments are in plain type. In addition, a commentary from Daystar competitor IIR is attached. Prior to entering this discussion a number of simple concepts should be reviewed. First, the delivered system performance of any computing machine is not represented by the simple performance of any one element of that machine but instead by the composite performance of the subsystems that comprise that machine. These subsystems include (but are not necessarily limited to): the central processor, memory, graphics hardware, compiler, mass storage, network and I/O. Additionally, the perceived or observed performance of a computer depends heavily on the measurement criteria chosen for the observation. Performance metrics can be chosen that focus on a single subsystem, a combination of subsystems or the composite performance of the entire machine. Computer manufacturers routinely strive to balance the performance of the subsystems if a machine in order to maximize the total system performance subject to a particular cost consideration. For example, in a very low-cost system, the manufacturer will pair a low-performance (relatively) CPU with a simple, low-performance memory subsystem. If the chosen CPU only requires 5 megabytes per second of memory bandwidth to deliver its maximum performance, it makes little sense for the computer manufacturer to pair this CPU with a 40 Mb/S memory system. Conversely, if a CPU requires 40 Mb/S of memory bandwidth to deliver the desired level of performance, it makes little sense to pair this machine with a 5 Mb/S memory. Similar analogies hold true for mass storage, network, graphics and so on. When considering upgrading the performance of a machine by replacing or supplementing some of its components it is important to understand the potential rewards for doing so. If the machine has many activities that are dominated by CPU performance, it may be highly beneficial to upgrade the CPU performance. Simple CPU accelerators perform this function well. If typical machine performance is dominated by lack of memory bandwidth, a simple accelerator may provide little benefit while a more sophisticated model (e.g. with on-board memory), may provide desired results. If machine performance is typically dominated by disk accesses, even a very sophisticated CPU accelerator may provide only marginal perceived results. Simply put, CPU accelerators provide the greatest benefit in machines where typical performance, as perceived by the user, is governed by processing power. When measuring the benefit of a CPU accelerator, one should evaluate either total system performance or CPU related activities to determine the merit of the upgrade. Daystar attempts to establish a justification for delaying the introduction of 68040-based accelerators based on two principle points: performance and compatibility. Motorola contends the accuracy of these claims as discussed below. Daystar hints at but fails to make a very reasonable point concerning the continuing utility of 68030-based accelerators: there are many Macintosh users who will be continued to be well-served by 68030 levels of performance and, due to other system implications, the performance offered by the 68040 may not be immediately required. >1.0 OVERVIEW >THE MOTOROLA 68040 PROCESSOR IS A MAJOR STEP FORWARD IN >PROCESSING POWER. WHEN COMPARED TO A 25 MHZ 68030/68882, A 25 >MHZ 68040 OFFERS DOUBLE THE INTEGER PERFORMANCE AND THREE >TIMES THE SPEED IN FLOATING POINT CALCULATIONS, AS SHOWN IN >TABLE 1. BUT A 25 MHZ 68040 IS ONLY SLIGHTLY FASTER THAN A 40 >MHZ 68030 (MAC IIFX). > > TABLE 1: PERFORMANCE RELATIVE TO A 25 MHZ 68030/68882 > REF: MOTOROLA > > TYPE 25 MHZ 68030 40 MHZ 68030 25 MHZ 68040 > INTEGER 1.0 1.6 2.1 > FPU 1.0 1.6 3.3 Extensive statistical and empirical studies conducted by Motorola have clearly established that the 25 MHz 68040 performs integer operations at 3.2 times the speed of a 25 MHz 68030 and floating point operations at roughly 5 times the 25 MHz 68030. While some variations from these figures in some customer's systems can be expected due to differences in compiler technology (e.g. structure alignment ...), these studies have be independently verified in system benchmarks by Motorola customers. There have been no inconsistencies in Motorola's position with respect to 68040 performance relative to the 68030. See section 3.1 for specific data. >INTEGER CALCULATIONS WHICH DRIVE MAC OPERATING SYSTEM (OS) >PERFORMANCE AND ALL APPLICATIONS SHOW GAINS OF 30%. THE REAL >STRENGTH OF THE 68040 LIES IN THE SPEED OF FLOATING POINT >CALCULATIONS, BUT THESE HAVE LITTLE OR NO BENEFIT FOR THE >TYPICAL GRAPHICS USER. ONLY APPLICATIONS IN THE SCIENTIFIC >AND CAD MARKET USE THE FLOATING POINT UNIT (FPU). IN SEVERAL >YEARS THE 68040 WILL BE RUNNING AT 40 MHZ. THIS PROCESSOR >WILL PROVIDE THE THE MUCH NEEDED POWER IN THE DTP, GRAPHICS, >PRE-PRESS AND SCIENTIFIC MARKETS. The above conclusion stems from the invalid performance assumptions drawn previously. The 25 MHz 68040 delivers almost exactly twice the integer performance of a 40 MHz 68030. Daystar's claims concerning Motorola product schedules are addressed below. >SOFTWARE COMPATIBILITY WILL BE A MAJOR PROBLEM ON 68040 >ACCELERATORS AS WELL AS APPLE'S NEW 68040 MACHINE. APPLE >WILL HAVE TO MAKE MAJOR PATCHES TO THE MAC OS TO HANDLE >PROBLEMS WITH MEMORY MANAGEMENT AND EXCEPTION HANDLING. IN >ADDITION, THE MATH CODE WITHIN APPLICATIONS WILL HAVE TO BE >REWRITTEN TO DIRECTLY LEVERAGE THE BENEFITS OF THE 68040'S >FPU. There are indeed differences between the O.S. programming for the 68030 and 68040. However, Daystar's assertion that this is a "MAJOR" problem is not generally supportable. First, the differences have been documented long enough (~1.5 years in print) for vendors to make appropriate plans. Second, the only major differences (or 'problems' in Daystar terminology) concern the virtual exception processing model and cache management. This impacts only a small portion of O.S. code Finally, the assertion that applications will have to be rewritten due the 040 FPU is entirely misleading as will be discussed below. >FOR THESE REASONS, DAYSTAR HAS DECIDED TO WAIT TO INTRODUCE >ITS 68040 ACCELERATOR UNTIL AFTER THE INTRODUCTION OF APPLE'S >68040 MACHINE. APPLE IS BEST SUITED MAKE THE NECESSARY OS >CHANGES AS WELL AS DRIVE CHANGES IN THIRD PARTY APPLICATIONS, >INITS AND CDEVS. The above expresses Daystar's opinion. Other accelerator manufacturers (e.g. Radius, Dove, IIR, Fusion Data) are not so inclined and are shipping 68040-based accelerators. Additionally, more than 50 different manufacturers of computer products are currently shipping successful 68040- based machines. >2.0 LESSONS FROM THE PAST >EACH NEW GENERATION OF PROCESSOR FROM MOTOROLA HAS >INCORPORATED NEW FEATURES AND CAPABILITIES, MANY OF WHICH ARE >NOT COMPATIBLE WITH THE CURRENT GENERATION. THE MAC OS, BY >ITS VERY NATURE, DIRECTLY ADDRESSES THE HARDWARE. TO THE >EXTENT THAT THE HARDWARE CHANGES, THE OS MUST BE PATCHED. THE >GREATER THE CHANGE IN THE ARCHITECTURE OF THE PROCESSOR, THE >GREATER THE NUMBER AND SOPHISTICATION OF THE PATCHES. Motorola agrees with this point with qualifications. The 68000 family has distinguished itself by maintaining complete upward compatibility for application software and confining all changes to be either proper supersets of existing functionality or visible only in the supervisor (O.S.) programming model. While O.S. modifications may be required, O.S. code represents a small fraction of the entire code pool. Further, portions of the O.S. that are effected by change, represent only a small portion of the total O.S. code. >2.1 THE FIRST ACCELERATORS >THE FIRST MAC PLUS AND SE ACCELERATORS UTILIZED A 68020 WITH >A 32-BIT BUS AS COMPARED THE 16-BIT BUS ON THE MAC SE'S 68000 >PROCESSOR. THAT AND ITS FASTER CLOCK SPEED (16 MHZ VS. 8 MHZ) >CAUSED MANY AGGRAVATING INCOMPATIBILITIES WITH PARTS OF OF >MAC OS, VARIOUS APPLICATIONS, AND MANY INITS. IT WAS NOT >UNTIL THE MAC II WAS INTRODUCED WITH ITS OWN 16 MHZ 68020 DID >APPLE AND THE DEVELOPER COMMUNITY COMPLETELY SOLVE THE >PROBLEMS. True, and every successive generation of the Mac O.S. has gone further the eliminate or reduce nasty things like timing dependencies. In fact, the latest releases of the Mac O.S. function quite well across a range of CPU performance spanning more than an order of magnitude (68000 -> 68030). Additionally, refinements in the Mac O.S. have taken away some of the development community's motivation for 'bad' programming practices leading to platform dependencies. >2.2 THE MAC II ACCELERATOR >PROBLEMS STARTED OVER AGAIN WHEN APPLE INTRODUCED THE 16 MHZ >68030 MAC IIX. SURPRISINGLY, THE 68030 IS NEARLY IDENTICAL TO >THE 68020 EXCEPT ADDITION OF THE 256 BYTE INTERNAL DATA >CACHE AND THE MEMORY MANAGEMENT UNIT (MMU). YET THERE WERE >NUMEROUS INCOMPATIBILITIES WITH VARIOUS PARTS OF THE MAC OS, >THIRD PARTY INITS AND APPLICATIONS. MANY CDEVS AND INITS >ACCOMPLISH THEIR SPECIAL TASK BY MAKING CHANGES TO THE OS OR >DIRECTLY ADDRESSING THE HARDWARE (NECESSARY TO ACCOMPLISH A >SPECIAL TASK THAT APPLE DID NOT PROVIDE, NEVERTHELESS, A >VIOLATION OF APPLE GUIDELINES). > >APPLICATIONS THAT CLOSELY FOLLOWED DEVELOPER GUIDELINES >GENERALLY WORKED WELL ON THE 68030 CONVERSION. THESE WERE >SEVERAL KEY APPLICATIONS THAT HAD PROBLEMS WORKING WITH THE >INTERNAL CACHE. AT THE SAME TIME DAYSTAR INTRODUCED THE 33 >MHZ 68030 ACCELERATORS. ONCE AGAIN, IT (AND OTHERS) EXPOSED >YET ANOTHER SET OF PROBLEMS WITH APPLICATIONS, INITS AND >CDEVS THAT HAD CLOCK TIMING DEPENDENCIES AND PROBLEMS WORKING >WITH AN EXTERNAL MEMORY CACHE. EVEN APPLE'S FLOPPY DRIVER >CODE WOULD NOT RUN PROPERLY AT SPEEDS ABOVE 16 MHZ. DAYSTAR >(AND OTHERS) INVESTED SIGNIFICANT TIME PATCHING THE FLOPPY >DRIVER CODE. FOR THE 25 MHZ MAC IICI, APPLE HAD TO COMPLETELY >REWRITE THEIR FLOPPY DRIVER CODE TO ELIMINATE THESE TIMING >DEPENDENCIES. > >FROM A SOFTWARE STANDPOINT, CONVERSION FROM THE 68020 WORLD >TO THE 68030 WAS ABOUT AS EASY AS ONE COULD EVER ASK FOR. YET >IT WAS VERY FRUSTRATING FOR THE END-USER. WHILE MOST PROBLEMS >WERE ENCOUNTERED WITH INITS AND CDEVS, END-USERS WERE NOT >WILLING TO ELIMINATE THEM AS THEY HAD BECOME AN ESSENTIAL >PART OF THEIR "TOOL KIT". SOME EARLY BUYERS FOUND THE >EXPERIENCE VERY FRUSTRATING - THEY DID NOT HAVE THE TIME (OR >SKILLS) TO FIDDLE AROUND TRYING TO DEBUG THEIR MACHINE. THE >SAME EXPERIENCE WAS ONCE AGAIN REPEATED WHEN APPLE INTRODUCED >THEIR 32-BIT CLEAN ROMS ON THE MAC IICI AND MAC IIFX. WITH >OVER A YEAR OF WARNING FROM APPLE TO THE DEVELOPER COMMUNITY >THERE WERE STILL MANY APPLICATIONS, INITS AND CDEVS THAT HAD >SIGNIFICANT PROBLEMS, DRIVING THE USERS CRAZY. Same general comments as above: it is a learning curve exercise. Additionally, there is always a cost associated with being in a market leadership position. This is true both for the 'early-adopting' manufacturer and the performance-driven user. The accelerator community has differentiated itself by creating a careful balance of the cost of leading and the benefit derived. >3.0 PERFORMANCE >THE 68040 INCORPORATES SEVERAL INNOVATIVE DESIGN FEATURES >THAT BOOST PERFORMANCE OVER A 68030/68882 COMBINATION RUNNING >AT THE SAME CLOCK SPEED. GAINS ARE REALIZED IN BOTH INTEGER >AND FPU PERFORMANCE. INTEGER PERFORMANCE DRIVES MAC OS AND >VIRTUALLY ALL APPLICATIONS. MAC OS, GRAPHICS, DTP AND PRE >PRESS APPLICATIONS MAKE LITTLE OR NO USE OF THE FPU, AS SHOWN >IN TABLE 2. FPU PERFORMANCE IS OF BENEFIT ONLY FOR A SUBSET >OF FUNCTIONS WITHIN CAD AND SCIENTIFIC APPLICATIONS. >SPREADSHEETS ONLY USE THE FPU FOR SPREADSHEET >RECALCULATIONS. > > > TABLE 2: BENEFIT OF 50 MHZ FPU ON IICI ACCELERATOR > REF: DAYSTAR >Platform MacIIci AccelIIci AccelIIci AccelIIci >Possessor 68030 68030 68030 FPU >Clock 25 MHz 50 MHz 50 MHz % Gain >FPU Yes NO Yes Yes >Word Scroll 8.9 6.5 6.5 0% >Renderman Render 98.0 82.0 56.0 46% >Excel Cut&Paste 9.1 5.5 5.5 0% >Excel Scroll 10.3 10.0 10.0 0% >Excel Recalc 10.4 6.6 5.6 17% >Xpress Fit in wndw 5.4 3.4 3.4 0% >Xpress Scroll 24.2 6.9 16.8 0% >FreeHand Fit in wndw 21.8 11.9 11.9 0% >Freehand Duplicate 34.5 18.7 18.7 0% >FileMaker Sort 56.3 42.2 42.2 0% >Swivel 3D Change View 17.4 8.1 8.1 0% >Swivel 3D Tween 73.1 32.6 32.6 0% >ClarisCad Fit in wndw 6.5 4.6 4.2 8% >PhotoShop Rotate 4.8 3.6 3.6 0% >PhotoShop Resample 36.6 19.7 19.7 0% >PhotoShop Gausian Blur 19.3 12.0 12.0 0% >Total Time (sec) 436.53 284.18 256.79 11% > > >SHOWN IN TABLE 2 IS A MAC IICI ACCELERATOR WITH AND WITHOUT A >50 MHZ 68882 FPU. DOUBLING THE SPEED OF THE FPU HAS NO >BENEFIT IN MANY APPLICATIONS, EVEN WITHIN CAD APPLICATIONS. >BASED ON THE EVIDENCE IN FIGURE 2, DAYSTAR RECOMMENDS THAT >ITS GRAPHICS AND DTP CUSTOMERS NOT BUY AN OPTIONAL 68882 FPU >ON ITS ACCELERATORS. Certainly not all applications utilize floating point math. However, those that do benefit substantially from floating point hardware. Note that Daystar carefully chooses a large sampling of application subsets that do not utilize floating point to prove their point. Even with a possibly biased sampling of 16 application subsets, only three of which utilize floating, Daystar measures an 11% overall improvement in benchmark performance. >3.1 INTEGER PERFORMANCE >FOR INTEGER PERFORMANCE, THE 68040 HAS A HIGH DEGREE OF >INSTRUCTION PARALLELISM IT IS CAPABLE OF EXECUTING IN ONE >CLOCK CYCLE AN INSTRUCTION THAT MAY TAKE 3-4 CYCLES TO >EXECUTE ON A 68030. THE 68040 HAS TWO 4,096 BYTE CACHES FOR >BOTH INSTRUCTION AND DATA, AND BOTH ARE FOUR-WAY SET >ASSOCIATIVE. CONTRAST THIS TO A 68030, WHICH HAS ONLY A 256 >BYTES DIRECT MAPPED CACHE (LESS EFFICIENT). THEREFORE, THE >68040 WILL EXHIBIT A MUCH HIGHER "HIT" RATE ALLOWING ZERO >WAIT STATE PERFORMANCE UP TO 40 MHZ. IN FACT, THE 68040 >CACHES ARE SO EFFICIENT THAT THERE WILL BE NO NEED TO ADD >EXTERNAL CACHE, AS IS REQUIRED IN THE FASTER 68030'S. > >PREDICTED INTEGER PERFORMANCE FOR THE 68040 (BASED ON >MOTOROLA DATA) IS SHOWN IN THE TABLE 3 AGAINST A ZERO WAIT >STATE 68030. PERCENTAGE GAINS ARE SHOWN AGAINST A 40 MHZ >68030 (TO REPRESENT A MAC IIFX). EXPECTED GAINS FOR THE 25 >MHZ 68040 ARE ONLY ON THE ORDER OF 30% (1.3) WHEN COMPARED TO >THE 40 MHZ 68030. > > TABLE 3: PERFORMANCE RELATIVE TO A 40 MHZ 68030 > REF: MOTOROLA > CLOCK 68030 68040 68040 VOLUME SHIP > 16 MHZ 0.4 N/A N/A > 25 MHZ 0.6 1.3 Q2 91 > 33 MHZ 0.8 1.7 Q1 92 > 40 MHZ 1.0 2.1 LATE 92 > 50 MHZ 1.3 N/A N/A > > >GAINS OF 30% WILL NOT SATISFY POWER USERS. THEY REALLY DEMAND >GAINS OF 100-200%, AND THESE WILL NOT BE AVAILABLE FOR >SEVERAL YEARS, AT LEAST FOR THE MAC IIFX. GAINS ON THE 16 MHZ >MAC IIS SHOULD BE A LITTLE OVER THREE TIMES GREATER WHEN THE >40 MHZ 68040 IS INTRODUCED IN LATE 1992, SO AN APPRECIABLE >UPGRADE MARKET WILL EXIST FOR USERS WHO WANT BETTER THAN IIFX >CLASS PERFORMANCE. BUT IN THE MEANTIME, WILL THE INITIAL >68040 COMPATIBILITY PROBLEMS BE MORE OF A PROBLEM THAN A IIFX >UPGRADE OR A 50 MHZ ACCELERATOR? Motorola denies authenticity of the above claims presented by Daystar based on three issues: first, the information is factually incorrect; second, the source of this information is not Motorola; third, if the source was Motorola, Daystar would be in serious breach of legal non-disclosure agreements concerning Motorola's future product plans. The legal agreement contained in Motorola file #89111652RD prohibits Daystar Digital Inc. from disclosing the proprietary information of Motorola Inc. Motorola has not formally introduced products beyond the current 25 MHz 68040 but has publicly stated that 33 MHz volume shipments will begin in 3Q91 with 40 MHz shipments beginning late in the year. Motorola reiterates early comments concerning 68040 performance relative to the 68030: the 68040 is 3.2 times faster on integer code and ~5 times faster on floating-point intensive code at the same clock frequency. Some simple facts illustrating timing differences between the 68030 and 68040 (cache hits assumed): #Clks #Clks Instruction: 68030 68040 Ratio Arith/LOG R->R 2 1 2 Arith M->R 5 1 5 MOVE M->R 5 1 5 MOVE R->M 3 1 3 FADD R->R 39 3 13 FMUL R->R 59 5 12 These are only rough examples, some benchmarks may be of use as well (source of all data is Workstation Laboratories): 50 MHz 25 MHz Benchmark 68030 68040 Ratio Dhry 1.1 21008 45454 2.2 Dhry 2.1 17493 38760 2.2 iSPEC 6.5 12.9 2.0 Linpack(DP Ftn) .425 1.69 4.0 Linpack (coded) .560 2.9 5.1 The source of the above data is Workstation Labs (an independent performance testing organization) - this data does not support Daystar claims. Note closely that the 68030 data is for 50 MHz operation (w/ 32k 0ws cache). Based on this data, a 25 MHz 68040 would be about 2.6 times faster than a 40 MHz 68030. >3.2 FPU PERFORMANCE >THE REAL POWER OF THE 68040 LIES WITHIN ITS FPU PERFORMANCE. >BY COMBINING THE CPU AND FPU INTO THE SAME PIECE OF SILICON, >FPU HAS BEEN BOOSTED THREE TIMES. BUT TO ACHIEVE THIS >INTEGRATION MOTOROLA ACCEPTED A MAJOR SACRIFICE IN >INSTRUCTION SET COMMONALITY. APPLICATIONS NOT WRITTEN TO >DIRECTLY ADDRESS THE 68040 FPU WILL EITHER HAVE TO BE >REWRITTEN, OR WILL HAVE TO OPERATE THROUGH ABOUT 256K OF CODE >THAT TRANSLATES THE 68882 CALLS INTO 68040 CALLS. THE >OVERHEAD REQUIRED FOR THIS TRANSLATION PROCESS WILL >DRASTICALLY REDUCE 68040 FPU PERFORMANCE GAINS. The real power of the 68040 lies in its integration, compatibility and sustained performance. Floating point is indeed a part of the performance picture but only a part. The ability of the 68040 to deliver excellent performance in very low cost memory systems has been very key to its success. Daystar's description of the 68040 floating point unit is somewhat inaccurate. The 68040 provides hardware support for a subset of the 68882 instruction set optimized to deliver superior performance on the most commonly used set of floating point instructions. Based on customer and market requirements, the majority of the 68040 silicon budget for floating point was dedicated to a set of critically important operations. A Motorola-supplied software package (the executable is ~40k, NOT 256k reported by Daystar) provides full object code compatibility with any 68881/68882 programs. When the 040 encounters a floating point operation it decides whether or not that particular instruction is one of the instructions supported in hardware (FMOVE, FCMP, FABS, FTST, FNEG, FADD, FMUL, FDIV, FSUB, FDBcc, FBcc, FSQRT, FSAVE, FRESTORE) and if so, the 040 executes that instruction. Otherwise (e.g. for transcendentals like FSIN) the 68040 automatically calls the floating point software package to perform this function - it is entirely invisible to the user. >IF AN FPU INTENSIVE FUNCTION IS REWRITTEN TO DIRECTLY USE THE >68040 FPU INSTRUCTION SET, THEN PERFORMANCE GAINS CAN BE >SUBSTANTIAL. TABLE 4 CONTAINS ESTIMATES FOR THE IMPACT OF THE >68040 FPU ON THE TWO FPU INTENSIVE FUNCTIONS SHOWN IN TABLE >2. It is also possible, with recompilation, to have an application directly call the floating point software package to avoid the overhead of the automatic call performed by the 040. This does have nice performance advantages but is not in any manner necessary for compatibility. >TABLE 4:ESTIMATED POSSIBLE 25 MHZ 68040 FPU PERFORMANCE GAIN > REF. DAYSTAR > Platform MacIIci AccelIIci AccelIIci AccelIIci > Processor 68030 68030 68040 FPU > Clock 25 MHz 50 MHz 25 MHz %Gain > FPU Yes Yes Yes > RenderMan Render 98.0 56.0 17.8 215% > Excel Recalc 10.4 5 6 3.5 60% > > >IN SUMMARY, THE MAC COMMUNITY WILL NOT SEE IMMEDIATE GAINS IN >68040 PERFORMANCE. A 25 MHZ 68040 IS NOT THAT MUCH FASTER >THAN A MAC IIFX, FOR INTEGER PERFORMANCE. AND, 68040 FPU >PERFORMANCE WILL BE OF LITTLE BENEFIT TO THE TYPICAL MAC >USER. HOWEVER, IN SEVERAL YEARS THE 40 MHZ 68040 WILL BE >DOUBLE THE SPEED OF THE MAC IIFX, AND OFFER EVEN GREATER >GAINS FOR CAD AND SCIENTIFIC FUNCTIONS DIRECTLY UTILIZING THE >68040'S FPU. This represents Daystar's opinion based on questionable and, in Motorola's opinion, inaccurate performance claims. >3.3 TODAY'S ACCELERATOR PERFORMANCE >THE LIMITED GAINS OF THE 25 MHZ 68040 ARE VERIFIED BY >BENCHMARKS RUN AT THE JANUARY, 1991 SAN FRANCISCO MACWORLD. >HERE, PROTOTYPE ACCELERATORS WERE BEING SHOWN BY TWO >DIFFERENT COMPANIES. IN TABLE 5, BENCHMARK PERFORMANCE IS >SHOWN AGAINST CURRENT STATE-OF-THE-ART MACHINES. > >THESE TEST SHOW THAT GAINS IN INTEGER PERFORMANCE ARE BELOW >MOTOROLA ESTIMATES. FPU PERFORMANCE IS NO BETTER THAN A >REGULAR MAC. THESE PROTOTYPES WERE OPERATING IN A VERY >RESTRICTED ENVIRONMENT (THEY WERE ONLY RUNNING BENCHMARKS). >APPLICATIONS WERE NOT BEING SHOWN. IN CONTRAST, ONCE THE >68030 WAS STABLE, UP AND RUNNING, THERE WERE FEW MAC OS OR >APPLICATIONS PROBLEMS TO OVERCOME. IN ALL FAIRNESS, THESE >WERE JUST EARLY ENGINEERING PROTOTYPES,AND THEY HAD NOT YET >"TWEEKED" PERFORMANCE TO THE MAXIMUM, AS IS COMMON IN THE >DEVELOPMENT PROCESS. > > TABLE 5: 25 MHZ PROTOTYPE ACCELERATOR PERFORMANCE > REF: DAYSTAR MEASUREMENT >OEM APPLE APPLE DAYSTAR TOKAMAC IIR >PLATFORM MACIICI MACIIFX MACIICI MAC LC MACII/IIX >CPU 68030 68030 68030 68040 68040 >FPU YES YES YES YES YES >SPEED 25 MHZ 40 MHZ 50 MHZ 25 MHZ 25 MHZ >FLOAT INT 0.18 0.15 0.10 0.20 0.10 >TRIG FPU 0.57 0.36 0.32 3.18 1.20 >BUTTERFLY FPU 2.33 2.17 1.57 4.18 2.40 >RIPPLES FPU 17.10 12.87 9.83 30.53 7.80 >SIEVE INT 0.27 0.18 0.15 0.22 0.16 >MOIRE INT 8.77 9.40 6.58 7.50 5.20 >TOTAL(SEC) 29.22 25.13 17.55 45.81 16.86 The above data provides little useful data since it provides nothing with respect to references points. Comparing a 68040 operating in one machine versus a 68030 operating in an entirely different machine is an apple-to-oranges comparison. If one really wants to generate a professional and conclusive comparison, the 040 and 030 should be compared against each other in the same environment. For example, the Tokamac in the MAC LC is restricted to running on a 16-bit data bus. What are the performance figures for the LC without an accelerator and with an 030-based accelerator running on a 16-bit bus? No data. What was the configuration of the 68040 in these evaluations? No data. What are the system configuration differences between the IIci, the II, the LC and the FX? No data. What is the native versus accelerated performance for each of the machines? No data. Is a 40 MHz Mac IIfx really only 16% faster than a 25 MHz Mac IIci? Well maybe if you pick the right benchmarks to prove your point. What is the point? >4.0. COMPATIBILITY >THE MAJOR PROBLEMS COME IN THE AREA OF SOFTWARE INTEGRATION >(BOTH MAC OS AND APPLICATIONS). THERE ARE THREE AREAS OF >COMPATIBILITY PROBLEMS: (1) MEMORY MANAGEMENT, (2) EXCEPTION >HANDLING AND (3) FLOATING POINT. > >4.1 MEMORY MANAGEMENT >SINCE THE INTRODUCTION OF THE MAC IIX, USE OF THE MEMORY >MANAGEMENT UNIT (MMU) IN THE 68030 HAS BECOME A FUNDAMENTAL >PART OF MAC SYSTEM SOFTWARE. IT IS USED TO GRANT ACCESS TO >MEMORY, FLIP BETWEEN 24 AND 32 BIT MODE, AND PROVIDE VIRTUAL >MEMORY UNDER SYSTEM 7.0 AND A/UX. > >BOTH THE 68030 AND 68040 HAVE ON-CHIP MMUS, BUT THEY ARE VERY >DIFFERENT IN FEATURE SET, REGISTER FORMAT, AND PAGE TABLE >FORMATS. IT IS SAFE TO SAY THAT ALL ROM CODE AND MAC SYSTEM >SOFTWARE WHICH DEALS WITH THE MMU MUST BE MODIFIED TO RUN ON >THE 68040. THE MAJORITY OF THIRD PARTY SOFTWARE SHOULD NOT >NEED MODIFICATION (EXCEPT PROCESSOR SPECIFIC PRODUCTS SUCH AS >VIRTUAL) OR PRODUCTS THAT ADDRESS MMU HARDWARE DIRECTLY >(THOSE WHICH VIOLATE APPLE'S GUIDELINES). True. Previous comments apply here as well. >4.2 EXCEPTION HANDLING >AN EXCEPTION IS DEFINED AS A CONDITION THAT THE PROCESSOR >DOES NOT KNOW HOW TO HANDLE. FOR EXAMPLE, DIVIDING BY ZERO, >ACCESSING NON-EXISTENT MEMORY, AND UNKNOWN PROCESSOR >INSTRUCTIONS ALL GENERATE EXCEPTIONS. > >THE PROCESSOR SAVES INFORMATION ABOUT THE OPERATION ON THE >STACK AND CALLS THE EXCEPTION HANDLER. IN SOME CASES, THE >68040 WILL PUT DIFFERENT INFORMATION ON THE STACK THAN THE >68030, CAUSING AN ERROR WITH THE EXCEPTION HANDLER. >APPLICATIONS, INITS AND CDEVS COMMONLY USE THE BUS ERROR >MECHANISM TO CHECK FOR THE EXISTENCE OF MEMORY. ALL OF THESE >MUST BE RECODED TO RUN THE 68040. MAC DEBUGGERS MUST ALSO BE >MODIFIED FOR CORRECT OPERATION ON THE 68040. > >MAC SYSTEM AND ROM SOFTWARE WHICH HANDLES EXCEPTIONS MUST BE >MODIFIED, AS WELL AS THE A/UX KERNEL. SOME THIRD PARTY >SOFTWARE WILL ALSO NEED TO BE CHANGED. Yellow journalism. The only exception handling change implemented in the 68040 is the virtual exception mechanism discussed in 4.1 (by nature, accessing 'non-existent' memory is a virtual exception). Daystar's discussion of zero-divide and "unknown" instructions seems intended to seed fear and uncertainty. >4.3 FLOATING POINT UNIT >THE 68882 (THE FPU THAT IS USED WITH 68030 BASED SYSTEMS) >UNDERSTANDS 5O DIFFERENT OPERATIONS. THE 68040 UNDERSTANDS >ONLY 20. UNLIKE THE 68882'S FPU, THE 68040'S INTERNAL FPU >DOES NOT PROVIDE TRIGONOMETRIC OPERATIONS SUCH AS SIN, COS, >AND TAN. FOR APPLICATIONS TO WORK CORRECTLY, AN EMULATOR MUST >BE PROVIDED THAT RUNS WHENEVER AN UNRECOGNIZED FLOATING POINT >OPERATION IS ENCOUNTERED THIS SOFTWARE MUST DECODE THE >REQUESTED OPERATION, DO THE OPERATION IN SOFTWARE, AND >RETURN THE RESULTS TO THE PROGRAM. > >THIS PROCESSING CAN BE DONE TRANSPARENTLY TO THE USED UNDER >SYSTEM 6 AND SYSTEM 7. FOR A/UX COMPATIBILITY, THE KERNEL >WILL HAVE TO BE MODIFIED. MOTOROLA PROVIDES ABOUT 256K OF >TRANSLATION CODE TO BE CALLED BY THE MODIFIED OS, BUT IT ADDS >ADDITIONAL OVERHEAD TO THE PROCESSOR TO TRANSLATE THE CODE. >THIS TENDS TO OFFSET THE PERFORMANCE BENEFITS. This issue was discussed previously. There is both some accuracy and some exaggeration to the Daystar description. >5.0 CONCLUSION >CONVERSION FROM THE WORLD OF THE 68030 TO THE 68040 IS THE >TOUGHEST ONE FACED YET. EARLY ACCELERATOR BOARD USERS WILL >FACE AN UNPREDICTABLE ENVIRONMENT WHERE SOME INITS, CDEVS, >AND APPLICATIONS DO NOT WORK. MOST OF ALL THEY WILL FACE AN >OPERATING SYSTEM WITH MAJOR INCOMPATIBILITIES IN A FEW KEY >AREAS (ESPECIALLY SYSTEM 7 VIRTUAL MEMORY). IN ADDITION >PERFORMANCE GAINS FOR THE 25 MHZ AND 33 MHZ VERSION WILL NOT >BE MUCH GREATER IIFX LEVELS. FOR THESE REASONS, DAYSTAR >DECIDED IN EARLY 1990 TO WAIT UNTIL APPLE INTRODUCED THEIR >OWN 68040. APPLE CAN BEST MAKE THE CHANGES NECESSARY FOR >68040 COMPATIBILITY. > >DAYSTAR HAS LEARNED THAT END-USERS JUDGE ACCELERATOR QUALITY >FIRST BY COMPATIBILITY AND THEN BY SPEED. UNTIL A PRODUCT IS >RELIABLE, FOR WHATEVER REASON, IT SHOULD NOT BE SHIPPED. >WAITING FOR APPLE'S 68040 TO BE RELEASED WILL FORCE THE >DEVELOPER COMMUNITY TO SOLVE ITS COMPATIBILITY PROBLEMS, WITH >A SPEED FAR GREATER THAN THAT PROVIDED BY ANY THIRD PARTY >DEVELOPER. ADDITIONALLY, APPLE WILL HAVE SOLVED ITS OWN >INCOMPATIBILITIES WITH THE OPERATING SYSTEM. > >SEVERAL MONTHS AFTER APPLE'S 68040 INTRODUCTION DAYSTAR PLANS >TO INTRODUCE A 68040 ACCELERATOR. IT WILL BE AN ACCELERATOR >THAT BUILDS ON APPLE'S APPROACH TO 68040 INTEGRATION. THIS >WAS AN ESPECIALLY DIFFICULT DECISION SINCE DAYSTAR HAD ALWAYS >BEEN THE FIRST TO BRING FASTER SPEED TO THE MACINTOSH II >FAMILY. IN THIS CASE, IT IS BEST TO LET APPLE GO FIRST. >DAYSTAR DOES NOT WANT TO PLACE ITS USERS ON THE ''BLEEDING- >EDGE'' OF TECHNOLOGY WITH LITTLE OR NO PERFORMANCE BENEFIT. Motorola truly wishes Daystar success with their decision. We do believe that 68030-based accelerator products can provide attractive cost/performance solutions for lots of Macintoshes. We also believe that 68040-based accelerators offer something special: the ability to scream through the toughest compute problem a Mac user will face. Read on for another opinion. Your mileage may vary so think about a test drive if you have doubts! THE FOLLOWING IS FROM IIR (A MAKER OF MACINTOSH ACCELERATION PRODUCTS): >You Know Where To Put This > >Reading DayStar's White Paper regarding 68040 is reminiscent >of how the horse trading industry responded to the automobile >in the 1800's. The horse traders tried to convince their >market that the automobile would never be practical because >the horse could navigate the roads that existed at that time, >and the automobile would be hopelessly lost without smooth >roads to operate on. After all, a horse and buggy could go >anywhere. On the other hand, there is some nice film footage >on cable showing automobile 'pilots' pushing automobiles out >of the mud in these early days. there are even pictures that >show the drivers disassembling their horseless carriages, and >rebuilding them back on the road. > >DayStar discussing 68040 issues much like a deaf person >discussing Mozart DSD's benchmark results are due to half a >minute's worth of time to run a single benchmark one time. >From those results, DSD felt strongly enough to generate >multiple pages of white paper explaining why they weren't >going to do an 040 product. > >Referring to the benchmark results in DSD's "paper", it is >true that they only saw demo software at MacWorld in January. >Why would we show anything substantial to the competition?! >Many bonafide customer's, as well as industry magazine >editors saw Macintosh applications such as PhotoShop and >MacDraw executing quite nicely, thank you very much. > >DSD maintains that floating point software doesn't gain any >performance because many FPU calls are emulated by software. >This misconception demonstrates their lack of understanding >of the 040 product. Motorola licenses the FPU emulation >software to real developers (even to DSD, if they had a use >for it). The results are truly awesome. Except for some mini- >and mainframe- computers, and some 64-bit RISC parts, the >68040 is the fastest CPU chip today. Thinking otherwise is >drinking your own bath water. > >It is true, however, that there are some incompatibilities >with current software. But, those incompatibilities are only >within programs that are poorly written, or crudely kluged by >weak programmers (you know who you are). > >Finally, we are quite aware of the performance and quality of >the DSD product line. We designed the Accelerator II line. >Does anybody really believe that we would spend months >producing an inferior product? > >The future waits for no one- technology will continue to move >forward. Let the natural evolution process of technological >survival of the fittest decide the future of Macintosh, and >its developers. Just as the automobile overcame adversity to >replace the horse and buggy, so shall the 040 function in the >Macintosh. Let the companies and people that can't keep up >fall by the technological wayside. > >As Ted Turner's desk plaque (purportedly) says: Lead >Follow, or Get Out Of The Way. > >The Design Team from IIR, formerly with DSD -- Regards, Jim Reinhart Motorola Microprocessor & Memory Technologies Group Austin, Texas
paul@taniwha.UUCP (Paul Campbell) (07/01/91)
In article <1991Jun28.230007.10651@oakhill.sps.mot.com> jtr@oakhill.sps.mot.com (Jim Reinhart) writes: > >>SOFTWARE COMPATIBILITY WILL BE A MAJOR PROBLEM ON 68040 >>ACCELERATORS AS WELL AS APPLE'S NEW 68040 MACHINE. APPLE >>WILL HAVE TO MAKE MAJOR PATCHES TO THE MAC OS TO HANDLE >>PROBLEMS WITH MEMORY MANAGEMENT AND EXCEPTION HANDLING. IN > >There are indeed differences between the O.S. programming for >the 68030 and 68040. However, Daystar's assertion that this >is a "MAJOR" problem is not generally supportable. First, >the differences have been documented long enough (~1.5 years >in print) for vendors to make appropriate plans. Second, the >only major differences (or 'problems' in Daystar terminology) >concern the virtual exception processing model and cache >management. This impacts only a small portion of O.S. code While I mostly agree with Mot. and disagree with Daystar on almost all these issues, I have to take exception to this response. There is a quite large body of code (mostly INITs and things that do trap patches) that runs on existing Macs and doesn't run on '040s, the main reason has to do with programmers who modify code, or load data containing code and execute it without being aware of the impacts of the caches in the '020/'030 and '040. The reason this happens is for 2 reasons: 1) People who are running '040s without writethrough data caches (ie a copyback cache) which means that data may be in the cache but that memory has an old value 2) The '040 caches are much bigger, code that happens to run today on an '030 fails on an '040 because the '030 instruction cache entries happened to be replaced because it's much smaller. So if you are a Mac programmer: if you load code in a resource, or copy it anywhere, or if you modify code (a good example is the trick of putting the address of a trap address after a jmp instruction when you are trap patching) remember to flush the caches, it's easy, you have to do both of them and it's done using the traps: _FlushInstructionCache _FlushDataCache If in doubt flush the caches! On the other hand CPUs like the '040 depend on caches for their performance - so don't flush the caches unnecessarily. Paul Campbell -- Paul Campbell UUCP: ..!mtxinu!taniwha!paul AppleLink: CAMPBELL.P Tom Metzger's White Ayrian Resistance has been enjoined to stop selling Nazi Bart Simpson t-shirts - Tom of course got it wrong, Bart is yellow, not white.