jwz@teak.berkeley.edu (Jamie Zawinski) (02/09/90)
I just recently got "popi", the image processing program described in Gerald Holzmann's book "The Digital Darkroom"; this program runs on a variety of systems, but the conditionally-compiled Amiga code is really really slow, and I'd like to fix this. So. Given a one-dimensional array of 8 bit (greyscale) quantities, I need to be able to draw one scanline. There must be a way to do this without repeatedly calling WritePixel(), right? (I realize that the Amiga can only display four bits of grey, but 8 bits is used interally by the program, and I'd rather not alter its portable data structures.) Any ideas? -- Jamie
a464@mindlink.UUCP (Bruce Dawson) (02/09/90)
One way of speeding this up is to write your own WritePixel() routine. This ONLY works if you have opened a custom screen and you aren't going to be having any windows over top of the window you are writing in to. You also have to make sure that menus don't come down over top while you are writing pixels, or your results will have holes in them. You can use LockLayers() or some such to prevent that from happening, but you'd better not LockLayers() for every pixel or performance will still be really poor (ie; do a LockLayers(), render a bunch of pixels, then do an UnLockLayers() would work). Another way, probably better, is to create a one scan-line buffer. Use some assembler code to translate an array of colours into your buffer, then use DrawImage() or some such to render the whole line onto the screen/window. This is more robust. Warning: When I last checked (several years ago I admit), DrawImage didn't block menus (ie; didn't do a LockLayers()), meaning that menus could drop down and get drawn over by DrawImage(), thus munging the display. If this is still true, put your own LockLayers() calls around calls to DrawImage. All of this is irrelevant if the calcs are the real CPU hog. Write a program to just write pixels to the screen and time it. If it takes 1% as long as the program your optimizing, ignore screen write time. If it takes 50% as long... .Bruce.
mwm@raven.pa.dec.com (Mike (Under Construction) Meyer) (02/10/90)
>> I just recently got "popi", the image processing program described in >> Gerald Holzmann's book "The Digital Darkroom"; this program runs on a variety >> of systems, but the conditionally-compiled Amiga code is really really slow, >> and I'd like to fix this. So. The problem isn't really the Amiga specific code; the problem is that the thing is just plain slow (it's barely useable on a DECStation 3100, at 14 MIPS). The heart of the code is a loop that interprets the stack-machine code for the transformation expression once for every pixel on the screen. The easy solution is to use fewer pixels - say 128x128 instead of 512x512; giving you a factor of 16 increase in speed with no work. The hard solution is to rewrite the "compiler" to generate a loop that runs in 68K machine language & run that, instead of generating stack-machine code and interpreting that (this is what pico, mentioned in the book, did). An intermediate solution is to write something that translates the stack-machine code into 68K ML, and run that instead of the stack-machine code. Of course, speeding up the display code wouldn't hurt. But I don't see much point until the transformation code is fixed. In any case, the best way I can see to do the drawing code is to have the blitter move one (two?) bit-plane at a time from the image array into the drawing area. You could also do this for each scan line, but that will be slower. Whatever you do, let us know! <mike -- He was your reason for living Mike Meyer So you once said mwm@berkeley.edu Now your reason for living ucbvax!mwm Has left you half dead mwm@ucbjade.BITNET
a464@mindlink.UUCP (Bruce Dawson) (02/11/90)
> doug writes: > > Msg-ID: <658@xdos.UUCP> > Posted: 11 Feb 90 17:22:51 GMT > > Org. : Hunter Systems, Mountain View CA (Silicon Valley) > Person: Doug Merritt > > In article <1092@mindlink.UUCP> a464@mindlink.UUCP (Bruce Dawson) writes: > > > > One way of speeding this up is to write your own WritePixel() routine. > > To give some idea, I did this for Thad's FaceShower program. It originally > used WritePixel for displaying 256x256 (approx.) pixels, and took about > 45 seconds. Changing to direct writes dropped it to around 20 seconds. > So if that's the approximate amount of time you want to save, then this > might be the way to go. (Factor of two.) > > But usually it's the algorithm that benefits most from optimization. > Changing the inline dithering calculations to table lookups with layout > optimized for the screen's bitplane layout dropped the time from 20 > seconds to around 1 second. (Factor of twenty.) > > Member, Crusaders for a Better Tomorrow Professional Wildeyed > Visionary Be careful where you give the credit for the speedup. Changing the algorithm, in this particular case, saved nineteen seconds. Changing the WritePixel() routine saved twenty-five seconds. The only reason that changing the algorithm looked so much better was because when you optimized it, it was the only major user of CPU time left. If you'd changed the algorithm and then changed WritePixel, you would have reported just under a factor of two improvement for the algorithm, and a twenty-six times increase for WritePixel. In reality, the two changes should share the credit, and little more than a factor of two improvement is possible without _both_ of them. .Bruce. P.S. I totally agree that the potential speed increases from a bood algorithm (as opposed to rewriting in hand coded assembler or similar techniques) are generally more important. But sometimes both are necessary, and an optimized WritePixel() routine is particularly easy and can speed things up greatly.
p554mve@mpirbn.UUCP (Michael van Elst) (02/11/90)
In article <21908@pasteur.Berkeley.EDU> Jamie Zawinski <jwz@teak.berkeley.edu> writes: >Given a one-dimensional array of 8 bit (greyscale) quantities, I need to be >able to draw one scanline. There must be a way to do this without repeatedly >calling WritePixel(), right? (I realize that the Amiga can only display four >bits of grey, but 8 bits is used interally by the program, and I'd rather >not alter its portable data structures.) Any ideas? The fastest way to fill lines is to move them right into the display bitmap. You have to split the incoming pixel values into bits, say 32 pixels at a time and construct the memory words that are needed for bitplanes. Since you only want to display 16 colors you might use 4 registers where the four longwords for each bitplane are assembled. Now that's for 68000 programmers but a good C-compiler could achieve nearly the same performance. One problem still arises, if you write into the screens bitmap you get into collisions with intuition rendering (like menus). And arbitrarily positioned windows are another hack. All these problems can be avoided if you put your pixels into an offscreen bitmap (pretty aligned, no locking needed). And then make a call to BltBitMapRastPort to move the data onto the screen. If you consider speed you may use double buffering. When one bitmap is drawn, the second can be copied to the screen. Note that it is not as easy to do completely asynchronous blitter operations. The simplest way is to use another drawing task that calls the graphics functions. Michael van Elst uunet!unido!mpirbn!p554mve
doug@xdos.UUCP (Doug Merritt) (02/12/90)
In article <1092@mindlink.UUCP> a464@mindlink.UUCP (Bruce Dawson) writes: > > One way of speeding this up is to write your own WritePixel() routine. To give some idea, I did this for Thad's FaceShower program. It originally used WritePixel for displaying 256x256 (approx.) pixels, and took about 45 seconds. Changing to direct writes dropped it to around 20 seconds. So if that's the approximate amount of time you want to save, then this might be the way to go. (Factor of two.) But usually it's the algorithm that benefits most from optimization. Changing the inline dithering calculations to table lookups with layout optimized for the screen's bitplane layout dropped the time from 20 seconds to around 1 second. (Factor of twenty.) When I hand optimized the assembler in the inner loop, it went from 1.2 seconds to .85 seconds. (Factor of 1.4) The moral of this is obviously that you should always optimize the algorithm first, before you start dinking around with bypassing AmigaDos and hacking assembler. That's where the biggest wins are. Then again, running it on Thad's 68020 sped it up from 0.85 sec to 0.25 sec, and you can't beat a factor of 3.4 speedup with no source changes! :-) Doug -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary
doug@xdos.UUCP (Doug Merritt) (02/13/90)
In article <1109@mindlink.UUCP> a464@mindlink.UUCP (Bruce Dawson) writes: > Be careful where you give the credit for the speedup. Changing the >algorithm, in this particular case, saved nineteen seconds. [...] Hmm. Sounds logical; looks like you caught me committing sloppy thinking. Right, it's additive in this case, not multiplicative. In any case it's still a good testimonial to those who think that coding everything in assembler is the only way to go. There was an article called "FastPix" in the Nov. Amazing Computing that showed how to do a custom pixel writing routine in assembler to save time, which will mislead many people. The impression given is that such a routine would be the ultimate in speed, just because it's written in assembler. Whereas usually a bigger gain can be gotten from staying in C and experimenting with different algorithmic speedups. Using assembler makes sense only when all other alternatives have been exhausted, and that's the only source of speedup left. Several times I've seen commercial products advertised as "(re)written 100% in assembler!!!!!!", as if that were a plus. Inner loops should be in assembler, not entire programs. Doug P.S. I suppose I'll now be inundated with flames from people pointing out that, since they have no hard disk, size of programs is critical to squeeze as many as possible onto their floppies. Yeah, I know, I was without a hard disk on my Amiga for years. But although some people write really perfect programs in assembler, with most you pay for that size bonus with generally buggier programs, simplistic algorithms, etc. -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary
cmcmanis@stpeter.Sun.COM (Chuck McManis) (02/13/90)
In article <1092@mindlink.UUCP> a464@mindlink.UUCP (Bruce Dawson) writes: > [good comments about speeding up writing deleted] > Warning: When I last checked (several years ago I admit), DrawImage > didn't block menus (ie; didn't do a LockLayers()), meaning that menus > could drop down and get drawn over by DrawImage(), thus munging the display. > If this is still true, put your own LockLayers() calls around calls to > DrawImage. In fact what is going on here is that the Screen rastport doesn't have an associated LayerInfo structure ('cuz it isn't layered) and this makes it faster to write to because the software doesn't check for clipping or "locked layers" but has the drawback that when Intuition thinks it has the system "locked" (like when a menu is down) writing to the screen will continue unconstrained. When righting to the Screen rastport you had better check your clipping range and, if you use menus, do something like MENU_VERIFY to figure out when menus are down. --Chuck McManis uucp: {anywhere}!sun!cmcmanis BIX: cmcmanis ARPAnet: cmcmanis@Eng.Sun.COM These opinions are my own and no one elses, but you knew that didn't you. "If it didn't have bones in it, it wouldn't be crunchy now would it?!"
nsw@cbnewsm.ATT.COM (Neil Weinstock) (02/13/90)
In article <1092@mindlink.UUCP> a464@mindlink.UUCP (Bruce Dawson) writes: > > One way of speeding this up is to write your own WritePixel() routine. [ ... ] Geez, you guys are all missing the obvious and correct solution. The answer, as it almost always is, is to use a lookup table. Use the data for each scan line as the lookup index (simply concatenate the pixel values into one long binary value), and put the corresponding Image structure into that table entry. So, you just take the data, look up the appropriate scan line, and use DrawImage(). Only one operation per scan line. How much quicker can you get? Extending this method to the entire screen is left as an exercise for the reader. ________________ __________________ _________________________ //// \\// \\// \\\\ \\\\ Neil Weinstock //\\ att!cord!nsw or //\\ "Your hair is so... //// //// AT&T Bell Labs \\// nsw@cord.att.com \\// lustre-laden." - Moss \\\\ \\\\________________//\\__________________//\\_________________________////
jeh@elmgate.UUCP (Ed Hanway) (02/16/90)
In article <9231@cbnewsm.ATT.COM> nsw@cbnewsm.ATT.COM (Neil Weinstock) writes: >Geez, you guys are all missing the obvious and correct solution. The >answer, as it almost always is, is to use a lookup table. Use the data for >each scan line as the lookup index (simply concatenate the pixel values into >one long binary value), and put the corresponding Image structure into that >table entry. So, you just take the data, look up the appropriate scan line, >and use DrawImage(). Only one operation per scan line. How much quicker >can you get? It's a joke, right? My calculator gave up but "bc" dutifully told me that a look-up table for each possible low-res scan line would take 3176248838703735071388317970997340557438739699483247316076389267090944\ 8424296143487586179936927258864875554241803836083347728168482432449060\ 4855548268961204923221725750403452934972636403246568312773419816518854\ 1019193627059188378560474396174450152676920272486972400155193608576973\ 9549574787063085604610601339929877328920047078547471584268060390863192\ 35148982818821631076219391836160 megabytes of RAM, assuming 4 bit planes. So are you going to post the table? :-)