spaf@PURDUE.EDU (Gene Spafford) (01/23/89)
Version 2.0 of the Purdue and PurduePlus X11 server speedups are now available for ftp. This release integrates both sets of patches, along with some new changes and bug fixes. Included is special code to work around the Sun 3/60+CG4 firmware bug causing bitrot when using the bfins instruction. You can ftp a copy from expo.lcs.mit.edu; get the file "contrib/Purdue.2.0-tar.Z". You can also ftp a copy from mordred.cs.purdue.edu; get the file "pub/X11/Purdue.2.0-tar.Z". The patches will be submitted to the comp.sources.x group for publication. Enclosed are the "Timings" and "README" files from this distribution. -----------------------------Timings---------------------------------- The following are some rough timing figures for the performance improvement possible using the Purdue/Purdue+ patches. System: Sun 3/60 with CG4, Sun OS 3.4, 8Mb memory, local disk All tests run with "-mono" switch on server. All compiles done with software floating point. Original version: X11R3 server, patches 1-4 applied. Compiled with Sun cc compiler, -O -fsoft-float options New version: X11R3 server, patches 1-4 applied. Purdue/Purdue+ 2.0 patches applied. Compiled with 1.31 GCC, with options: -O -traditional -msoft-float -fstrength-reduce -finline-functions Linked with Sun 3.4 "malloc" recompiled with GCC (same options as above) Run against canned exercise, including: xload, xphoon, xsetroot -bitmap, xterm (with and without -j), ico -r, and others. Original version user+sys time: 289.0 seconds, startup to shutdown. New version user+sys time: 218.4 seconds, startup to shutdown, a 24+% improvement. Some selected excerpts from grof for the two versions: Old version New version ----------- ----------- total total ms/call name ms/call name 42.50 _mfbUnnaturalTileFS 31.34 _mfbUnnaturalTileFS 10.26 _mfbDoBitblt 7.07 _mfbDoBitblt 6.62 _mfbImageGlyphBltWhite 4.19 _mfbImageGlyphBltWhite 4.98 _mfbTEGlyphBltWhite 3.70 _mfbTEGlyphBltWhite 4.50 _mfbPaintWindow32 3.88 _mfbPaintWindow32 1.71 _mfbPushPixels 1.43 _mfbPushPixels 1.15 _mfbBlackSolidFS 0.78 _mfbBlackSolidFS 0.71 _mfbLineSS 0.52 _mfbLineSS 0.37 _mfbSolidBlackArea 0.32 _mfbSolidBlackArea 0.21 _mfbBresS 0.12 _mfbBresS 0.12 _malloc 0.05 _malloc -----------------------------README---------------------------------- About Purdue/PurduePlus 2.0 --------------------------- This is the second release (for X11R3) of a set of changes to the frame buffer code of the X11 sample server. These changes are designed to make the server faster for B/W for most machines, and Vax and 68020 machines (e.g., MacIIs, Apollos, Sun 3s) in particular. (Patches for the color fb will follow, eventually.) The changes make a significant (but sometimes difficult to measure objectively) impact on the speed of most operations. This speedup will differ based on your job mix and machine configuration. Some operations appear to take up to 50% less cpu time to complete. Incremental measurements with gprof, time, and other tools show each change to have a positive overall effect on the server efficiency. In particular, painting windows and drawing lines appears to be much faster. An "ico -r" is obviously faster and smoother, as is tiling the root window. Interestingly enough, the binary after installing these patches also seems *smaller*. This second release is basically an integration of the first release of each of the Purdue and Purdue+ releases, along with new optimizations and bug fixes. Some special changes have been made to take advantage of optimizations possible when using the GCC compiler. These have all been ifdef'd on the symbol __GNUC__ so they will not interfere with compilation using other compilers. However, if you have the GCC compiler, you can take advantage of these (and they are well-worth the effort!). The GCC-specific changes are noted below. Motivation & Changes -------------------- The generic server shipped with X11R3 is designed to run on many different machines. It was not written with speed in mind, although some efforts were made at optimization. Looking at the code reveals a number of places where changes could be made to make the code faster. These include: * Optimized or added bitmasking functions, taking advantage of properties known to exist for certain arithmetic operators and domains of input; * Replacing calculated bitmasks with table lookups * Use of Duff's device in some places where it looks beneficial (note: the first release of these patches used a Duff's device or order 8. Tests with Sun 3s and MacIIs show that an order 4 device gives better performance, probably due to caching.) * Reordering of code to share variables or move invariants out of loops. * Expanding some code inline instead of doing calls or loops * Taking advantage of knowledge about *when* code is called. We have tried to make these changes in a way that is maintainable and easily marked; every modification is enclosed in ifdef's on the symbol PURDUE. Installation ------------ The patches in this archive should all be applied to the files in the server/ddx/{mfb,cfb,mi} and server/include directories. These are all formed to apply to *unmodified* X11R3 server sources. Using Larry Wall's "patch" program, you can apply them all as follows: server="path to your X11 server source directory" for patch in *.patch do patch -l -N -p -d $server < $patch done Next, you need to set the symbol PURDUE (and possibly NO_3_60_CG4, see "A GCC & Sun Problem," below) in your site.def file (e.g., #define OptimizedCDebugFlags -O -DPURDUE") to use them. You can also patch your Makefiles (e.g., server/ddx/mfb/Imakefile) as follows: *** server/ddx/mfb/Imakefile.orig Thu Nov 17 15:52:45 1988 --- server/ddx/mfb/Imakefile Thu Nov 17 15:52:45 1988 *************** *** 19,24 **** --- 19,25 ---- mfbpawhite.o mfbpablack.o mfbpainv.o mfbtile.o \ mfbtewhite.o mfbteblack.o mfbmisc.o mfbbstore.o + DEFINES = -DPURDUE STD_DEFINES = ServerDefines CDEBUGFLAGS = ServerCDebugFlags INCLUDES = -I. -I../../include -I$(INCLUDESRC) Similar patches must be made to ddx/mi/Imakefile and ddx/cfb/Imakefile since ddx/mfb/maskbits.h is included in files in those directories. Note: The change to ddx/mi/miarc.c is to fix a bug, and you may install it if you wish. The bug has been submitted to the X folks but not yet officially "blessed." The change to ddx/mfb/mfbimggblt.c is also a bug fix for a clipping problem, and it too has yet to be officially sanctioned, but it works for us so you're welcome to use it if you see fit. Whatever changes you make, you will need to cd to the server directory, then: make clean Makefile; make Makefiles depend; make GCC Notes --------- A working server can be built using gcc 1.31 and the flags "-O -traditional -finline-functions -fstrength-reduce" and either of "-msoft-float" or "-m68881", as appropriate. gcc 1.32 is known to produce bad code and should be avoided. GCC-specific changes in these patches include: * Special "asm" instructions to use bitfield instructions on Vaxen and 68020 machines in place of shift/mask combinations in the getbits/putbits macros. * Using the builtin alloca function instead of the library alloca call. GCC-specific changes have been marked with ifdef __GNUC__. Note that the change to include/os.h does *not* have the PURDUE symbol associated with it since it is dependent on only the compiler being used. If you have source code available, consider compiling your heap code (malloc, free, calloc, etc) with gcc and including it with the server. You can either do this with the entire library, or you can copy the heap source code into the os/4.2bsd directory and recompile it there. Under SunOS 3.4 on a Sun 3, recompiling the heap code with gcc (using the -finline-functions and -fstrength-reduce options) results in more than a 100% speedup in heap operations. Also, using GCC to compile os/4.2bsd/oscolor.c can result in problems unless you take corrective measures. The problem lies with the fact that GCC returns structures differently as function values than does cc-derived code. The symptom of this problem is that a GCC-compiled server will have totally black (or white) screens with no observable text. To fix the problem, either compile oscolor.c with the regular "cc" compiler, or compile the dbm library with gcc and link against that. You can also apply the enclosed dbm-gcc.h.patch file to your /usr/include/dbm.h file and compile oscolor.c with gcc as normal. If you have source code available, copy both the dbm and heap files to os/4.2bsd, then modify the Imakefile as follows: *** Imakefile.orig Sun Oct 30 22:46:56 1988 --- Imakefile Wed Jan 18 22:49:37 1989 *************** *** 13,23 **** */ #ifndef OtherSources ! #define OtherSources #endif #ifndef OtherObjects ! #define OtherObjects #endif BOOTSTRAPCFLAGS = --- 13,23 ---- */ #ifndef OtherSources ! #define OtherSources dbm.c malloc.c #endif #ifndef OtherObjects ! #define OtherObjects dbm.o malloc.o #endif BOOTSTRAPCFLAGS = A GCC & Sun Problem ------------------- One change in particular should be noted. The inclusion of the GCC-defined "asm" statements to speed up bit operations is a big win for most 68020 machines and Vaxen. Unfortunately, the Sun 3/60+CG4 combination as sold by Sun has a bug in the firmware that causes writes using the "bfins" instruction to fail in some circumstances. This has been reported to Sun as bug report #1016963. You should enquire of your Sun support people if some kind of fix is available. In the meantime, this set of patches is distributed with a software workaround. BY DEFAULT the workaround is installed. If you are building a server that will NEVER run on Sun 3/60+CG4 machines, then be sure to define the symbol NO_3_60_CG4 along with the symbol PURDUE. I.e., your site.def file should include the flags -DPURDUE -DNO_3_60_CG4. You only need to do this if you build the server with the gcc compiler and you are building for a mc68020 processor; the optimizations involved are automatically enabled for other architectures. Once Sun develops an ECO to fix the bug, this flag can also be turned on for 3/60+CG4 machines. This would be nice, because without the fix, those machines can only use 1/2 of the speedups. Color Machines -------------- As time allows, we will examine similar changes to the cfb code for color machines. Stay tuned. Questions ----------- We will try to respond to any of your questions or comments about these patches -- just send us some e-mail. We would also like to hear about any of your own enhancements, benchmarks, etc. Enjoy! Gene Spafford & Martin Friedmann spaf@cs.purdue.edu martin@citi.umich.edu 1/22/89 Thanks to: ---------- Sam Kimery of PURDUE ECN helped develop the optimizations in the first release of these fixes (for X11R2). Terry Donahue of Project Athena contributed some server fixes with the X11R3 release that helped focus our attention on certain sections of code. The Purdue/Florida Software Engineering Research Center provided the machines and funding that allowed Spaf to do his tinkering. Thanks to Jim Fulton for testing the changes (Release I) on a Mac II, and for recommending some format changes. Rusty Sanders of Megatek helped isolate the bug in the Sun 3/60+CG4 combo. Our thanks and apologies to anyone else we forgot to mention.