spaf@cs.purdue.EDU (Gene Spafford) (11/18/88)
I have just sent to comp.sources.x my patches to X11 Release 3 to speed up the mfb (monochrome) server performance. These patches may also be ftp'd from mordred.cs.purdue.edu (file: ~ftp/pub/X11/Purdue-speedups.mfb.Z) and expo.lcs.mit.edu (file: ~ftp/contrib/Purdue-speedups.mfb.Z). I hope they will be available on some other systems soon. I hope to do a similar set of optimizations for the color (cfb) code sometime in the next few weeks. Enclosed is the README file from the distribution. Intro ----- These are the Purdue speedups for X11, Release 3. These apply only to B/W servers (for the most part); similar patches for the color code should be released sometime in the next few weeks. Installation ------------ The patches in this archive should all be applied to the files in the server/ddx/mfb directory. You need to set the symbol PURDUE in your macros or site.def file (e.g., #define OptimizedCDebugFlags -O -DPURDUE") to use them. You can also patch your server/ddx/mfb/Imakefile as follows: *** server/ddx/mfb/Imakefile.orig Thu Nov 17 15:52:45 1988 --- server/ddx/mfb/Imakefile Thu Nov 17 15:52:45 1988 *************** *** 19,24 **** --- 19,25 ---- mfbpawhite.o mfbpablack.o mfbpainv.o mfbtile.o \ mfbtewhite.o mfbteblack.o mfbmisc.o mfbbstore.o + DEFINES = -DPURDUE STD_DEFINES = ServerDefines CDEBUGFLAGS = ServerCDebugFlags INCLUDES = -I. -I../../include -I$(INCLUDESRC) Similar patches must be made to ddx/mi/Imakefile and ddx/cfb/Imakefile since ddx/mfb/maskbits.h is included in files in those directories. Whichever change you make, you will need to cd to the server directory, then: make Makefile; make Makefiles depend; make Description ----------- The changes in these patches fall into a few, similar categories: * Optimized or added bitmasking functions, taking advantage of properties known to exist for certain arithmetic operators and domains of input; * Replacing calculated bitmasks with table lookups * Use of Duff's device in some places where it looks beneficial * Reordering of code to share variables or move invariants out of loops. The changes seem to make some significant (but sometimes difficult to measure objectively) impact on the speed of most operations. This speedup will differ based on your job mix and machine configuration. Some operations appear to take up to 35% less cpu time to complete. Incremental measurements with gprof, time, and other tools show each change to have a positive overall effect on the server efficiency. In particular, painting windows and drawing lines appears to be much faster. An "ico -r" is obviously faster and smoother, as is tiling the root window (on my Sun 3/60). Note that my measurements have not been done with any formal benchmarking, so they might still benefit from some tuning. In particular, making the "Duff" macro unroll more or less items might be beneficial. On the MacII, for instance, a Duff's Device of only 4 is better than the 8 used in the patches enclosed. Interestingly enough, the binary after installing these patches also seems *smaller*. The changes have been generated in a machine-independent way and should work on any other machine, although I have not yet been able to try them on any other kind of cpu. I have thoroughly tested them on various Sun 3 machines, and all my changes have been to optimize for that architecture (68020). If I blew it, or if you have more portable/better versions of these changes, please share them with me and I'll use them in further releases. [I think this code is used in the Apollo servers, and since they also use a 68xxx chip, these changes should work there too. They have been tested on a Mac II and work there (and act as speedups). I would love to see how these work on a Vaxstation, but DEC wouldn't even sell one to me at retail! If anyone at DEC can explain what DEC has against us/me here at Purdue, I'd love to talk with you. We're all mystified. ] Future Work ----------- Some optimizations could still be done on this code. Other than changes to the cfb code, these include: * putting in some hacks to enhance speed for certain compilers. In particular, gcc can produce incredibly good code, and some small tuning of the mfb/cfb code could result in significant improvements. * putting in custom assembler macros for commonly used instructions, such as the bits routines, abs, round, etc. * algorithmic changes that radically change the nature of how some things are done in the server; this amounts to a rewrite of portions of the server. I may get around to doing some of these in the next few months. If not, I hope others will and then share their results with the rest of us. Please share your comments on this package with me -- I'd like to know what else *we* can do to make this server a more efficient piece of code. Thanks to: ---------- Acknowledgments Sam Kimery of PURDUE ECN helped me develop some of the optimizations in the first release of these fixes (for X11R2). Terry Donahue of Project Athena contributed some server fixes with the X11R3 release that helped focus my attention on some sections of code, although I did not use any of his changes in these patches. The Purdue/Florida Software Engineering Research Center provided the machines and funding that allowed me to do this tinkering. Thanks to Jim Fulton for testing the changes on a Mac II, and for recommending some format changes (including the "SmallDuff"). Gene Spafford spaf@cs.purdue.edu 11/17/88 PS. Late breaking numbers from someone (who wishes to remain anonymous) who has applied these changes to the X11.R3 code on a Vax with an Xqvss display: >Date: Thu, 17 Nov 88 15:24:53 EST >To: spaf >Subject: Xqvss numbers > >Okay, quicky results: > > filled rectangles: 25% faster > lines: no real change > image text 8: slightly slower > text 8: slightly slower > rectangles: slightly faster > copy area: 20% faster > >It seems like it would be interesting to figure out what helps where and >set up appropriate #ifdefs after people have had a chance to poke at it. So, these changes also work on a Vax, but should be tuned a little differently. Let me encourage people who can access Vaxen to contribute such fixes. -- Gene Spafford NSF/Purdue/U of Florida Software Engineering Research Center, Dept. of Computer Sciences, Purdue University, W. Lafayette IN 47907-2004 Internet: spaf@cs.purdue.edu uucp: ...!{decwrl,gatech,ucbvax}!purdue!spaf
spaf@cs.purdue.edu (Gene Spafford) (11/18/88)
The patches are also on gatekeeper.dec.com in the file ~ftp/pub/X11.contrib/Purdue-speedups.mfb.Z Enjoy. -- Gene Spafford NSF/Purdue/U of Florida Software Engineering Research Center, Dept. of Computer Sciences, Purdue University, W. Lafayette IN 47907-2004 Internet: spaf@cs.purdue.edu uucp: ...!{decwrl,gatech,ucbvax}!purdue!spaf
pfh@pai.UUCP (Peter Hill) (11/27/88)
In article <5462@medusa.cs.purdue.edu>, spaf@cs.purdue.EDU (Gene Spafford) writes: > > I have just sent to comp.sources.x my patches to X11 Release 3 to > speed up the mfb (monochrome) server performance. Thanks, Gene! The speedups are very nice indeed. Now for a problem report: VERSION: X.V11R3 + patches 1-2 + Purdue-speedups.mfb CLIENT MACHINE and OPERATING SYSTEM: Sun 3/160M, SunOS 3.2 DISPLAY: Sun BW2 AREA: mfb speedups DESCRIPTION: Neither mfbsetsp.c nor mfbbitblt.c would compile under SunOS 3.2, apparently because of the new putbitsrop() macro in maskbits.h. See code below for details. FIX: Perhaps there is a better/easier way to fix this. Below is a context diff of my hacks to maskbits.h. This patch can be applied after applying maskbits.h.patch from Purdue-speedups.mfb. ------cut here----- *** ddx/mfb/maskbits.h.purdue Fri Nov 25 19:52:54 1988 --- ddx/mfb/maskbits.h Sat Nov 26 10:35:38 1988 *************** *** 353,363 **** else \ { \ int m = 32-(x); \ register unsigned int *ptmp_ = (unsigned *) (pdst)+1; \ ! *(pdst) = (*(pdst) & endtab[x]) | (t2 & starttab[x]); \ t1 = SCRLEFT((src), m); \ DoRop(t2, rop, t1, *ptmp_); \ ! *ptmp_ = (*ptmp_ & starttab[n]) | (t2 & endtab[n]); \ } \ } --- 353,366 ---- else \ { \ int m = 32-(x); \ + register int t3; \ register unsigned int *ptmp_ = (unsigned *) (pdst)+1; \ ! t3 = t2 & starttab[x]; \ ! *(pdst) = (*(pdst) & endtab[x]) | t3; \ t1 = SCRLEFT((src), m); \ DoRop(t2, rop, t1, *ptmp_); \ ! t3 = t2 & endtab[n]; \ ! *ptmp_ = (*ptmp_ & starttab[n]) | t3; \ } \ } ------cut here----- -- ______________________________________________________________________________ Peter Hill pfh@pai.mn.org +1 612 894 0313 Prime Automation, Inc. ...{sun!tundra,umn-cs!hall,bungia}!pai!pfh