name@portia.Stanford.EDU (tony cooper) (02/08/90)
Suppose I wanted to spend, say, a day making X11R4 faster for my particular hardware. What would be the best way to spend my time? I am interested in 8 bit color speedups at the expense of portability. For example: Is there a piece of code that is executed everytime the server does something? I could convert it to assembly language. Is there some device-independent code that could be converted to device-dependent code? The README for the cfb directory says the cfb code is "very slow". Is the cfb directory a good place to start? Which are the 20 most time-critical lines? Which are the 20 most time-critical lines in the whole of X11R4? I have done this sort of thing before with spectacular results. I once sped up a program an order of magnitude by changing one line of C code inside a loop into assembly language and using registers. It seems to me that X11 has a lot of potential for this kind of speedup since it is so device independent and since all graphics operations boil down to just the simple turning on or off bits. There must be some code somewhere that does the nitty gritty bitty stuff at a low level that I can get at. For those interested in receiving a copy, I'll be doing this for the MC68030. Even more specifically, it will be for the Macintosh II. Tony Cooper tony@popserver.stanford.edu
keith@EXPO.LCS.MIT.EDU (Keith Packard) (02/09/90)
> Suppose I wanted to spend, say, a day making X11R4 faster for my > particular hardware. What would be the best way to spend my time? > I am interested in 8 bit color speedups at the expense of portability. As usual, this depends almost completely on what you will be using the server for. The R4 server is actually pretty good at a wide range of common tasks. > The README for the cfb directory says the cfb code is "very slow". The README file was not updated for R4. Look at the CHANGES file and you'll see a more promising comment: "This directory now provides a real implementation for 8-bit frame buffers, driving the frame buffer at memory bandwidth for many operations" The most heavily tuned operations are BitBlt, text painting, line drawing and rectangle filling. For these operations, you'd be hard pressed to get much performance increase even coding them in assembly. > For those interested in receiving a copy, I'll be doing this for > the MC68030. The R4 server has code which is tuned for the 68020 family; it allows you to specify a few machine characteristics which guide the compiler to the correct bits of code. > Even more specifically, it will be for the Macintosh II. This is the bad news. The Mac II frame buffer cards which sit on the NuBus have memory latency of ~1us per access; and no special block-mode optimizations which give them a bandwidth of 4Mb/sec (4 bytes/access). Nothing the R4 server does can help this out; you end up with a server which runs 2.5times slower than a Sun 3/60. Some of the newer Macintoshes have on-board frame buffers. I expect that eliminating the NuBus would give them more respectable frame buffer latency numbers, but I haven't ever had a chance to bench mark them running our code. If you want to start tuning the R4 code, get a copy of x11perf and start measuring. As usual, any changes you make would be welcome at MIT if directed at 'xbugs@expo.lcs.mit.edu'. Keith Packard MIT X Consortium